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Introduction 



Mathematics and biology have a synergistic relationship. The linkage between these two 
sciences is embodied in the reciprocal contributions they make to each other: biology 
generates complex problems and mathematics can provide ways to understand them. 
Indeed, mathematics provides a simple representation of the reality through the creation 
of models and their solution by the extensive use of numerical and symbolic computational 
software. 

In many cases, understanding quantitatively a natural process, leads to a differential 
equation model. 

These considerations led our attention to the study of a new method for the numerical 
solution of ODEs systems which arise from biological models. In particular, we consider 
a method which makes use of a Radial Basis Function Network (RBFN) and it is applied 
to a biological model for diabetes and insulin therapy. 

A RBFN, as the name says, is the union of two powerful tools: Radial Basis Functions 
(RBF) and Neural Networks. Both these tools have gained much attention in recent 
years thanks to the fact that they can be applied to many different areas of science and 
engineering. 

For what concerns RBF, they were used for the first time in the solution of the real 
multivariate interpolation problem (cf. [2]) and their great success is mainly due to the 
fact that they are a meshless method, i.e. they are not related to a grid (see Chapter 1). 
With the term "Neural Network" we improperly indicate Artificial Neural Network 
(ANN), i.e. a net of artificial neurons that can be used both to gain an understanding of 
biological neural networks and to solve artificial intelligence problems without necessarily 
creating a model of a real biological system. The architecture of an ANN is inspired by 
the brain structure. Indeed, an ANN is a group of interconnected units, called neurons, 
which are related by linkages, called synapses, and organized in layers. The data process- 
ing takes place in the neurons. From a mathematical point of view this means that the 
input data are passed to a function, typically non-linear, called activation function. 
In this thesis we focus on the specific case of a RBFN, that is an artificial neural network 
whose activation function is a RBF (see Chapter 2). This choice is motivated by the 
fact that a RBFN is a "universal approximator", in the sense that almost every function 
can be approximated by an artificial neural network (cf. [11, 12]). Moreover, differently 
from a common approach which used a backward propagation algorithm to teach the 
artificial network how to perform a task, in the case of a RBF network the learning phase 
consists in the solution of an interpolation problem. Thus we do not need to investigate 
the convergence problems of the backward propagation algorithm and in the meanwhile 
we also reduce the computational cost of the network training phase. 
Once we have an explicit expression for the RBF network, we proceed in three steps: 



1. study of a method for the simultaneous approximation of a function and its deriva- 
tives (see Chapter 3); 

2. use of the previous result to solve an ordinary differential equation of arbitrary order 
(see Chapter 4); 

3. application of the RBF theory to the vectorial valued case and use of this theory, 
together with the result in 2, to solve an ODEs system (see Chapter 5). 

For what concerns the first step, we refer to [6]. The key idea in that paper is that if 
we approximate the function with the RBFN, then, simply differentiating the network, 
we may obtain an approximation also for the derivatives. This leads to what they call 
a Direct Radial Basis Function Network method (DRBFN). On the other hand, if we 
approximate the nth derivative with the RBFN and then we integrate it, we obtain the 
Indirect Radial Basis Function Network method (IRBFN). 

We do not limit to the numerical experiments and we make a deep analysis on the network 
functioning. It will be clear from these numerical results that some RBF provide better 
accuracy than others. This is the reason why we have repeated the simulations with 
different kernels. 

A first contribution in this thesis is the observation that this different accuracy can be 
justified mainly by two reasons. The first concerns the kernel we choose for the approxi- 
mation: those kernels ip such that ip(r) — ► oo as r — ► oo have better localization properties 
compared with those ones that tend to zero as r — ► oo. It is worth noting that localization 
properties are highly important to the success of general approximation methods. The 
second reason is related to the function / to be approximated. It is well known that a 
good approximation is guaranteed only if / belongs to the native space of the kernel ip. 
We did also an error analysis using error estimates in [7] , for different kind of functions 
and kernels. 

We also state an existence theorem for the RBFN, under some assumptions on the radial 
function ip, the proof is in [12]. 

For numerical experiments of the second step above we refer to [9]. Moreover, since in the 
numerical construction of RBFN or in the solution of the ordinary differential equation, 
we face with the solution of an ill conditioned linear system, an analysis of the matrix 
condition number has been necessary. Thus, we illustrate two methods to reduce the 
conditioning: the Riley method [16] and a domain scaling. The results suggest that both 
these approaches depend on the specific example we take into account. However the 
scaling performs much better. 

Finally, in Chapter 5, we extend the RBF approach to the vectorial case. All the theo- 
retical results of Chapter 1 are still valid if instead of considering one single kernel, we 
consider a matrix of kernels. In particular the interpolation problem, which arises from 
the RBFN definition, is well posed and admits unique solution if we multiply the matrix 
kernels by a positive definite matrix, say C. How the coefficients of C should be chosen 



is not automatically determined. Thus, a parameter analysis has been done, see Section 

5.2. 

The last Chapter can be seen as an original generalization of the previous. Indeed the 

RBFN permits the representation of the ODEs system in a compact form, which reflects 

the matrix form of the interpolation problem in the vectorial case. 
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Chapter 1 

Radial Basis Functions: essential 

theory and properties 



In this first chapter we summarize the main definitions and results about Radial Basis 
Functions (we can shorten in RBF) in the case of scalar-valued data. All the details on 
the Radial Basis Functions theory and theorems proofs are in [1]. 

Before giving the formal definition of a Radial Basis Function, we briefly introduce the 
problem of scattered data interpolation. 

Problem 1.1 Given data (pi.j,yj), j = 1, ...,N, with x^ G R s , yj G R, find a continuous 
function Pf such that Pf(x.j) = yj, j = 1, ..., N. 

One common approach to solve the scattered data interpolation problem is to assume 
that the function Pf is a linear combinations of certain basis function B k . 

N 

P / (x) = ^c fc ,B fc (x), xGR s (1.1) 

fc=i 

This leads to a system of linear equations of the form 

Ac = y (1.2) 

where the entries of the interpolation matrix are Ajk = -Bfc(xj), j,k = 1, ..., N and c = 
[d,...,c N } T , y = [y u ...,y N ] T . 

In the univariate setting Problem 1 is always solvable. In the multivariate setting, on the 
contrary, the following result holds: 

Theorem 1.2 If ft C R s , s > 2, contains an interior point, then there exist no Haar 
spaces of continuous functions except for one- dimensional ones. 

This theorem tells us that if we want to have a well-posed multivariate scattered data 
interpolation problem we can no longer fix in advance the set of basis functions we plan 
to use for interpolation of arbitrary scattered data. 

As an example in I = [0,1], we can construct the piecewise linear spline interpolant 
assuming Pf of the form 

N 

Pf{x) = ^c k \x - x k \, are [0,1] (1.3) 

fc=i 
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and then determine the coefficients c& imposing the interpolation conditions 



(1.4) 



The points x*., to which the basic function is shifted, are usually referred as centers. While 
the points in which the function is computed or sampled are the data sites. There are no 
motivations in choosing the centers different from the data sites, so usually they coincide. 
Since the functions B^ are radially symmetric about their centers x*., this constitutes the 
first example of radial basis function. 

We can generalize this approach to the interval [0, l] s considering the Euclidean norm of 
the shift. In this case the interpolation matrix is: 



/llxi 
|x 2 



x ll 

Xll 



l x l 

|x 2 



x 2 | 
x 2 | 



l X l 

|x 2 



x 3l 
X3I 



l X l 

|x 2 



x/v||\ 
xat|| 



Xjv -Xi 



Xat -x 2 



|Xjv -x 3 | 



Xat -Xjv 



This matrix is called distance matrix. 

In the following we give the formal definition of RBF. 



Definition 1.1 A function $ : R s — ► R is called radial provided there exists a univariate 
function (p : [0, 00) — ► R such that 



where 



$(x) = (p(r), r = ||x|| 

is some norm on R s , usually the Euclidean norm. 



(1.5) 



From the definition above we can observe that $ is radially symmetric about its center. 
Indeed, if ||xi|| = ||x 2 || then 3>(xi) = < l ) (x 2 ) where x l7 x 2 £ R s . It follows that the 
simplest example of radial basis function is the Euclidean distance we used to solve the 
interpolation problem, i.e. ip(r) = r. 
Now, after these considerations we can write the interpolation matrix as 



/V(||xi-xi||) ¥?(||xi-x 2 ||) v?(l|xi-x 3 || 
V(l|x 2 -x 1 ||) ¥?(|jx 2 — x 2 |j) </?(||x 2 -x 3 || 

^(||xjv-xi||) ^(|| Xi v-x 2 ||) ^(||xjv-x 3 | 



•• v?(l|xi -x 7V ||)\ 
•• ^(||x 2 — XiV ||) 



V?(||X7V -Xjv 



Example. The most important example of radial basis function is the Gaussian. 

(p(r) = e - (er)2 , rGR (1.6) 



9 



where e is a shape parameter and is related to the variance a 2 of the normal distribution 
function by e 2 = i. In particular, a smaller value of e causes the function to become 
flatter, while an increasing e leads to a more peaked RBF (see figure 1.1 in the case e = 2). 






V 



-1 -1 



(a) Gaussian with e = 0.05 
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(b) Gaussian with e = 2 
Figure 1.1: Gaussian for different values of the shape parameter 

If we compose the Gaussian with the euclidean norm we obtain, for any fixed center x^, 
a multivariate function: 



$ fc (x) = e" 



e* x— Xfe | 



x e 



(1.7) 
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Clearly, the connection between $ and <p is given by 

$ fc (x) = v?(||x-x fc ||). (1.8) 

Before talking about different radial basis functions, we introduce the fill distance which 
is a measure of the data distribution: 

h = hx,n = supmin||x — x j 1 1 2 . (1-9) 

1.1 Positive Definite Functions 

We have seen that the scattered data interpolation problem leads to the solution of a 
system of linear equations with matrix A. 

It is well known, that the system has a unique solution whenever the matrix A is non- 
singular. Clearly this happens if the matrix is positive definite. 

Definition 1.2 A real symmetric matrix A is called positive semi-definite if its associated 
quadratic form is non-negative, i.e., 

N N 

for all c= [c 1} ...,c N ] T eR N . 

If the quadratic form is zero only for c = then A is called positive definite. 

All the eigenvalues of a positive definite matrix are positive, so the determinant, which is 

the product of eigenvalues, is always positive and the matrix is non-singular. 

What is more, if we have basis functions Bk that generate a positive definite interpolation 

matrix, we will always have a well-posed interpolation problem. 

To this end we introduce the concept of a positive definite function. 

Definition 1.3 A complex-valued continuous function $ : W — ► C is called positive 
definite on M. s if 

N N 

3=1 fc=i 

for any N pairwise different points x l7 ..., x^r € M s and c = [ci, ..., Cn] t G C^. 

The function $ is called strictly positive definite on 1R S if the quadratic form (1.11) is zero 

only for c = 0. 

We can now illustrate some of the most important properties and characterizations of 
positive definite functions. Proofs of these properties can be found in [1]. 
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Theorem 1.3 Some basic properties of positive definite functions are: 

1. Non-negative finite linear combinations of positive definite functions are positive 
definite. If $i, ..., $7v are positive definite on W and Cj > 0, j = 1, ..., N, then 

N 

$(x) = Y^ c i $ i( x ) 5 xel s (1.12) 

i=i 

is also positive definite. Moreover, if at least one of the $., is strictly positive definite 
and the corresponding Cj > 0, then $ is strictly positive definite. 

2. $(0) > 0. 



3. $(-x) = $(x). 

4- Any positive definite function is bounded. In fact, 

|$(x)| < $(0). (1.13) 

5. //$ is positive definite with $(0) = then $ = 0. 

6. The product of (strictly) positive definite functions is (strictly) positive definite. 

The definition and the properties above suggest that we should use strictly positive definite 

functions as basis functions in the interpolation problem. 

In most of our work we will consider scalar-valued functions. 

In the following we provide a characterization for real-valued (strictly) positive definite 

functions. 

Definition 1.4 A real-valued continous function $ is positive definite on M s if and only 
if it is even and 

N N 

J2 Yl c J c k$(xj - x*) > (1-14) 

for any N pairwise different points xi, ..., x^ £ IR S and c = [c±, ..., Cn] t £ C N . 

The function <l> is called strictly positive definite on M s if the quadratic form above is zero 

only for c = 0. 

We can now give the most famous characterization of positive definite functions in terms 
of Fourier transforms established by Bochner in 1932. 

Theorem 1.4 A (complex-valued) function $ £ C(1R S ) is positive definite on W s if and 
only if it is the Fourier transform of a finite non-negative Borel measure jd on W , i.e. 

$(x) = A(x) = -=L= / e-* xT ^My), x £ R* (1.15) 

V(27Tj s Jrs 
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Proof We will prove only the easy direction, i.e. we will prove that assuming $ the Fourier 

transform of a finite non-negative Borel measure then $ is positive definite. 

Thus, 

iV TV N N r 



j=l k=\ 

N N 



dn(y) 



VW) 



j=i fc=i 

V 



= / lE c ^ xry | 2rf My)>o 



where the last inequality holds because of the assumptions imposed on the measure \x. 

□ 

To guarantee that the interpolation problem will be well-posed we extend Bochner 's 
theorem to the case of strictly positive definite functions. 

To obtain this result we should introduce the concept of carrier of a (non-negative) Borel 
measure defined on some topological space X: 

X \ U {O : O is open and fi(0) = 0} (1.16) 

Theorem 1.5 Let fi be a non-negative finite Borel measure on M s whose carrier is a set 
of nonzero Lebesgue measure. Then the Fourier transform of fi is strictly positive definite 



Proof Theorem's proof proceeds exactly as in the Bochner's theorem. 

Thus, we obtain that the Fourier transform is positive definite. To show that is strictly 

positive definite, let 

N 

i=i 
and assume that the points Xj are all distinct and c/0. 

In this case the functions y — > e~ lx ' y are linearly independent, so that g ^ 0. Since g is an 
entire function, its zero set can have no accumulation point and therefore it has Lebesgue 
measure zero. 

Now, the only remaining way to make the above inequality an equality is if the carrier of 
fi is contained in the zero set of g, i.e. has Lebesgue measure zero. 
But this is ruled out in the hypothesis of the theorem. □ 

Finally, we state a criterion to check whether a given function is strictly positive definite: 
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Theorem 1.6 Let <f> be a continuous function in Li(M. s ). $ is strictly positive definite 
if and only if $ is bounded and its Fourier transform is non-negative and not identically 
equal to zero. 

Till now we haven't required the function $ to be radial. In fact, the function 

N 

P / (x) = ^c fc $(x-x fc ), xeK s (1.18) 

fe=i 

will yield an interpolant that is only translation invariant. If we want invariance also 
under rotations and reflections we should use positive definite functions that are radial 
on R s . 

Lemma 1.1 //$ = <p(\\ ■ ||) is (strictly) positive definite and radial on M s then $ is also 
(strictly) positive definite and radial on R a for any a < s. 

1.1.1 Examples of strictly positive definite RBF 

In this section we give some examples of strictly positive definite radial functions which 
we will use later for our purposes. To show that they are strictly positive definite we will 
use the criterion in Theorem (1.6). 

• Gaussians: 

$(x) = e" e2||x|12 e>0 (1.19) 

The Fourier transform of this function is a Gaussian and this explains why the 
function is strictly positive definite. 

I ii^ii 2 

$(w) = — -7= — e~^ (1.20) 

• Matern Functions 

iv f (iixii)iixiri s 

$(X) = a \ TWs P > ~ (1-21) 

v ; 2' 3 - 1 r(/?) H 2 K ' 

where K v is the modified Bessel function of the second kind of order v. 
The Fourier transform of the Matern functions is given by the Bessel kernels 

$(a;) = (l + ||a;|| 2 )-' 3 >0 (1.22) 

Therefore the Matern functions are strictly positive definite on M s Vs < 2j3. 



14 
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Figure 1.2: Matern with j3 = *±i 
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Figure 1.3: Matern with /3 = ^ 



Generalized Inverse Multiquadrics 



$(a;) = (1 + ||x|| V, (3> 



(1.23) 
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This is exactly the Fourier transform of Matern function. 

To show that is positive definite we need Hankel inversion theorem, which states 
that the Fourier transform for radial functions is its own inverse, i.e. 

T*\TsV\ = ^ (1-24) 

Thus, using Hankel's theorem, we can reverse the role of the function and of the 
Fourier transform in the previous example and we obtain that generalized inverse 
multiquadrics are strictly positive definite for s < 2j3. 

When (3 = \ we have the particular case of Hardy's inverse multiquadric. 

1.2 Conditionally Positive Definite Function 

Definition 1.5 A real symmetric matrix A is called conditionally positive semidefinite of 
order one if its associated quadratic form is non-negative, i.e. 

N N 

j=i fe=i 
for all c = [ci, ...,Cjv] £ ^& N that satisfy 

N 

J> = (1.26) 

i=i 

J/c^O implies strict inequality in (1.25) then A is called conditionally positive definite 
of order one. 

The reason to introduce conditionally positive definite matrices (and later conditionally 
positive definite functions) rises from the fact that sometimes is desirable to have an 
interpolant which exactly reproduces simple polynomial functions. 

For example if the data are constants or come from a linear function, it would be nice if 
our interpolant were also constant or linear, respectively. 

The problem is that the methods we have seen so far do not reproduce even simple 
polynomial functions. 

A simple remedy to this problem is to modify the assumptions on the form of the inter- 
polant adding a polynomial basis. 
This leads to the following formulation of the interpolation problem: 

N M 

p/(x) = J2 c *¥>(ii x - xfcii) + J2 <W x ) x G RS ( L27 ) 

fc=l 1=1 



16 Chapter 1. Radial Basis Functions: essential theory and properties 

where Pi,...,Pm form a basis for the M = ( m „^_^ s )- dimensional linear space H s m _ 1 of 

polynomials of total degree less than or equal to m — 1 in s variables. 

To obtain a square matrix and thus a unique solution we also add the following conditions 

N 

J2ckPi(*k) = (1.28) 

fc=i 

Thus we obtain the following augmented system: 

A p \ f c \ _ (y 
p t o) \d) \o 

where A jk = ^(|| Xj - x fc ||), j,k = l,...,N, P jt = p z (x fc ) k = 1,...,N, I = 1,...,M, 

c = [ci, ..., Cat] t , d = [di, ..., dM\ T i y = [yi, ■■■,Vn] T , is a zero vector of length M and O 

is a M x M zero matrix. 

Now we need conditions for the augmented matrix to be non-singular: 

Theorem 1.7 Let A be a real symmetric N x N matrix that is conditionally positive 
definite of order one, and let P = [1, ..., 1] T be a N x 1 matrix. 
Then the system of linear equations 



P T \d lo 



T nll.l-UI (1-29) 



is uniquely solvable. 



We can observe from the previous theorem that since strictly positive definite matrices 

are also conditionally positive definite of order one, the (augmented) radial basis function 

interpolation matrix for reproduction of polynomials of order (i.e. scalar constants) 

is non-singular. This means that strictly positive definite matrices reproduce exactly 

constants. 

Now, we can generalize the results above introducing conditionally positive definite and 

strictly conditionally positive definite functions of order m. 

Clearly, these functions provide the natural generalization of RBF interpolation with 

polynomial reproduction. 

Definition 1.6 A complex-valued continous function $ is called conditionally positive 
definite of order m on M s if 

N N 

J2 Yl c A$(xj - x fc ) > (1.30) 

j=l k=l 

for any N pairwise distinct points Xi, ...,xjy G M s and c = [ci, ...,cn] t G C n satisfying 
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N 

J>p(x,) = (1.31) 

i=i 
for any complex-valued polynomial p of degree at most m — 1. 

The function $ is called strictly conditionally positive definite of order m on W if the 
quadratic form above is zero only for c = 0. 

The following lemma gives an immediate property of conditionally positive definite func- 
tions. 

Lemma 1.2 A function that is (strictly) conditionally positive definite of order m on 
M s is also (strictly) conditionally positive definite of any higher order. In particular, a 
(strictly) positive definite function is always (strictly) conditionally positive definite of any 
order. 

As we have done for (strictly) positive definite functions, we can give a characterization 
also for scalar- valued functions. 

Definition 1.7 A real-valued continous even function $ is called conditionally positive 
definite of order m on 1R S if 

N N 

J2 Yl c J c kH*i - x fc ) > (1.32) 

for any N pairwise distinct points Xi, ...,xjv £ K s and c = [ci, ...,Cn] t € C N satisfying 

N 

J>p(x,) = (1.33) 

i=i 
for any real-valued polynomial p of degree at most m — 1. 

The function $ is called strictly conditionally positive definite of order m on 1R S if the 
quadratic form above is zero only for c = 0. 

Now we can state the generalization of theorem 1.7 for strictly conditionally positive 
definite functions of order m. 

Theorem 1.8 If the real-valued even function $ is strictly conditionally positive definite 
of order m on 1R S and the points x 1; ...,xjy form an (m — l)-unisolvent set, then the system 
of linear equations (1.29) is uniquely solvable. 

Where a set is said to be (m — l)-unisolvent if the only polynomial of total degree at most 
m — 1 interpolating zero data is the zero polynomial. 

We can also give an integral representation for conditionally positive definite functions in 
terms of the Fourier transform. 
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Theorem 1.9 Suppose the complex-valued function $ owns a generalized Fourier trans- 
form $ of order m which is continous on M s \{0}. Then <f> is strictly conditionally positive 
definite of order m if and only if $ is non-negative and non-vanishing. 

1.2.1 Examples of conditionally positive definite functions 

We list some examples of strictly conditionally positive definite functions that will be 
useful later for our purpose. 

• Generalized Multiquadrics 

The Multiquadric Radial Basic Function 

$(x) = (l + ||x||y, xet s Jet\N (1.34) 

was used for the first time, to solve an interpolation problem, in 1968 by Iowa 
State University Geodesist Roland Hardy. He then described his method in a paper 
published in 1971 ([2]). 

His study was motivated by a cartography problem who had tried to solve using 
Trigonometric and Algebraic interpolation methods, both of which were unsatisfac- 
tory. 

The method remained unnoticed till 1979 when Richard Franke, in a study done at 
the Naval Postgraduate School, concluded that Hardy 's interpolation method was 
the best. 

What is more, he conjectured that the system matrix of the method was invertible 
and that the method was well posed. 

The matrix invertibility was proved in 1986 by Charles Micchelli. 

Now, we can prove that generalized multiquadrics are strictly conditionally positive 
definite using generalized Fourier transform: 

2 1+/3 

$ ^) = f7^y l|a;|l ~ /3 ~ l ^ + t (l|a;|l) ' w ^° (L35) 

This transform is of order m = max(0, [/?]). 

As before, K v is the modified Bessel function of the second kind of order v. 

Since the generalized Fourier transform is positive with a singularity of order m in 
the origin, generalized multiquadrics are strictly conditionally positive definite of 
order m = \/3~\ (and higher). 

If j3 < the Fourier transform is the classical one and we obtain again the inverse 
generalized multiquadrics of the previous section that are strictly conditionally pos- 
itive definite of order m = that means they are strictly positive definite. 



1.2. Conditionally Positive Definite Function 
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Funzione di Hardy 







/ 



(a) Hardy's Multiquadrics obtained with /3 = 4 
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Multiquadrica generalizzata con beta=5/2 
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(b) Multiquadric with (3 



Thin Plate Splines 



$(x) = ||x|| 2/3 log(||x||), xeR s ,peN 



(1.36) 



This function has generalized Fourier transform 



$H = (-i)^ +1 2^- 1+ fr(^ + -)p\\\u\\- s - 2f3 



(1.37) 



of order m = (3 + 1. Therefore the Thin Plate Splines are strictly conditionally 
positive definite of order m = j3 + 1. 
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Figure 1.4: Thin Plate Spline with f3 = 1 

1.3 Reproducing Kernel Hilbert Space 

In this section we associate with each (strictly positive definite) radial basic function a 

certain space of functions called its native space. This gives a connection with the concept 

of Reproducing Kernel Hilbert Space (RKHS). 

We will generalize this theory in the last chapter for the vectorial case where we will study 

the problem of interpolate a vector- valued function. 

The following statement is a formal definition: 

Definition 1.8 Let TC be a real Hilbert space of function f : Q(c R s ) — > R with inner 
product (•, -) n . A function K : ft x ft — > R is called reproducing kernel for TC if 

1. K(-,x) ETC Vxefi 

2. /(x) = (f,K(-,x)) H VfeH and Vx e ft 

The name reproducing kernel is inspired by the reproducing property 2 in the definition 

(1.8). 

A RKHS is unique and satisfies the following further properties: 



Theorem 1.10 Suppose TC is a Hilbert space of functions f : Q 
kernel K . Then we have: 



with reproducing 
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1. K(x, y) = (K(; y),K(-,x)) H VxjeH 

2. K(x, y) = K(y,x) forx,yefL 

3- if\\f-fn\\H^0 forn^oo Men |/(x) - / n (x)| - VxeO 

The last property means that convergence in Hilbert space norm implies pointwise 
convergence. 

From Riesz representation theorem we have that existence of a reproducing kernel is 
equivalent to the fact that the point evaluation functional 5 X are bounded in ft. This 
means that there exists a positive constant M depending on x such that 

\5 x f\ = \f(x)\<M\\f\\ H V/eft, xel] (1.38) 

This observation is very useful because gives us a connection with the vectorial case where 
the same theory is valid. Indeed, we will see in the next chapters, that reproducing kernel 
in the multidimensional case satisfies almost the same properties of the scalar case. 
Now we would find a connection between the reproducing kernel, which is known to be 
positive definite, and strictly positive definite function. 
One direction of this connection is given by the following theorem: 

Theorem 1.11 Suppose 7i is a RKHS with reproducing kernel K : fi x Q — > 1R. Then 
K is positive definite. Moreover K is strictly positive definite if and only if the point 
evaluation functionals 5 X are linearly independent in TC* . 

Now we show that every strictly positive definite radial basic function can be associated 
with a reproducing kernel Hilbert space, its native space. 
Indeed, for all the functions / in the space TC, we can write: 

N 

f = J2cjK(-,*j) Xj-Gfl (1.39) 

3=1 

As a consequence of Theorem 1.10 we have that 

h = (fif)n 

N N 

j=l fe=l 

N N 

= 5252 c 3 c k( K (-, x 3), K (-, *k))H 

3=1 fc=l 

N N 

= 5252 c i CkK ^j^- 

3=1 k=l 
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So, we can define the (possibly infinite-dimensional) space 

H($l) = span {K(-,y) :yeO} (1.40) 

with the associated bilinear form (-,-)k given by 

N K N K N K N K 

(52 c J K (-> x i)' 52 dkK ^ yk ^ K = 52 52 c A K ^^yk)- (i.4i) 

j=i fc=i i=i fc=i 

where JV/r = oo is also allowed. 

Theorem 1.12 If K : Q x Q — > M. is a symmetric strictly positive definite kernel, then 
the bilinear form (-,-)k defines an inner product on Hk(&)- Furthermore, Hk(Q) is a 
pre-Hilbert space with reproducing kernel K . 

Since this space is not complete, we define the native space Nk(Q) to be the completion 
of H K (Q) with respect to the K-norm || • \\ K so that ||/||k = ||/||jv k (n) f° r all / G H K (Q). 
From the construction above, we have that we can think K(x, y) = <I>(x — y), where $ is 
a strictly positive definite radial function. 

1.3.1 Examples of Native Spaces 

In this subsection we give some examples of native spaces associated to a class of radial 
basis function. 

• Any strictly positive definite function $ whose Fourier transform decays only alge- 
braically has a Sobolev Space as its native space. 

In particular Matern functions has A/$^(M S ) = IFf (M s ) with /3 > | as native space. 

Where W™{W S ) = {/ G L 2 (R S ) n C(R S ) : /(-)(1 + || • \\ 2 2 ) n ^ 2 G L 2 (M S )| is the 
Sobolev space. 

An equivalent definition for the set above is: 

W^iVt) = {/ G L 2 (Q) n C(Q) : D a f G L 2 {Q) V|a| < m, a G IT} (1.42) 



Wendland's compactly supported functions, that we will introduce later, have 



native spaces A4 afc (M s ) = W 2 s/2+fe+1/2 ' TO ^ 



the native spaces for Gaussians and Inverse Multiquadrics are rather small. 
Nevertheless, Gaussian 's native space contains the class of so-called band-limited 
functions, i.e. functions whose Fourier transform is compactly supported. 

These functions play an important role in sampling theory where the main result is 
Shannon's theorem: 
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Theorem 1.13 Suppose f G C(1R S ) fl L^IR*) such that its Fourier transform van 

l l 

2> 2 



ishes outside the cube Q = [—h, h] s ■ Then f can be uniquely reconstructed from its 



values on Z s , i.e. 

/(x) = J]/(Osmc(x-0, xeR s (1.43) 



In the statement above sinc(x.) = Yld=i 



sill(7TX d ) 



The construction of native spaces for conditionally positive definite functions is 
rather technical, so we limited the discussion to strictly conditionally positive defi- 
nite functions. 

For example, for Thin Plate Splines the native space is the so-called Beppo-Levi 
space of order k: 

BL k = {f e C(R S ) : D a f e L 2 (R S ) for all |a| = k, a e W} (1.44) 



Chapter 2 

Radial Basis Function Networks 



An artificial neural network is a system, inspired to biological neural networks, which 

processes information and data. 

Even if nowadays computers are more powerful and faster than some years ago, they are 

still not able to solve problems that are trivial for a human being, such as the recognition 

of a specific object. 

This is due to the fact that the approach with which computers solve problems is by 

following a specific algorithm, i.e. a series of sequential instructions and operations, like 

the ones necessary to do complex computations or to store a big amount of data. 

Artificial neural networks are composed by a lot of units, called neurons, some of which 

receive information from the environment (input units), some of which give responses 

(output units) and finally others which are hidden inside the networks (hidden units). 

weights 
inputs 

— ^w,?K 

activation 
functon 

net input 




activation 



function 

threshold 
Figure 2.1: Artificial Neural Network 

Every unit becomes active if the amount of information it has received is more than a 

fixed threshold and then it emits a signal which is transmitted along the network till it 

reaches other units. 

Every connecting point between units can increase or decrease the signal as it happens 

with brain's synapse. For this reason these points are called synapse or even weight, 

since they weigh the intensity of the transmitted signal. 

Furthermore, an artificial neural network learns which are the correct association 

input-output from the presentation of different examples. Usually the learning phase is 

quite long and it requires the examples to be shown a lot of times. During this phase 

the network changes his synaptic connections following some learning rules and adapting 



26 Chapter 2. Radial Basis Function Networks 

itself to the problem is going to solve. 

Thanks to this structure, a neural network, simulating biological neural networks, 
performs tasks collectively and in parallel. Its main characteristics are: 

• Robustness: it is resistant against noise, since it gives a correct output even when 
the input is noisy or a part of the connections are destroyed. Clearly the higher the 
noise, the lower the accuracy. But the way the error increases is peculiar, it could 
increase uniformly or it could be present in response of some inputs and totally 
vanish for others. What is more, the advantage in using neural networks is that 
they can learn again and improve their performances. 

• Flexibility: thanks to the learning phase, a neural network has a lot of application 
fields, for example: information theory ( data compression, sonar signal recognition, 
alphanumeric character recognition), control systems, medical diagnosis (imaging 
analysis) and neuroscience. 

Moreover, since it learns from presentation of patterns, there is no need to know 
exactly the solution of a problem in all its details. It is sufficient to know the 
project's aim and some constraints to choose the neural network model in the right 
way. 

• Generalization: In the learning phase a neural network tries to memorize all the 
common characteristics of every shown pattern in order to use this information 
when dealing with unknown patterns. 

• Recovery: a neural network is able to recover data and information starting from 
some incomplete or corrupted data, in the same way human beings are able to 
remember something even if they have only partial information. 



2.1 Historical remark 

The first model of artificial neuron was created by Mc-Culloch and Pitts in the 1943. 
They proposed to assemble a lot of neurons to compute complex logic functions, but this 
first kind of artificial neural network was not able to learn and wasn't adaptive. 
Some years later, the Canadian psychologist Donald Hebb introduced the concept of 
learning for artificial networks. His idea was that if two neurons are acting simultaneously 
then their connection value is augmented. Even if Debb's studies were only qualitative 
they set the path for further investigations. 

In fact, during the fifties and sixties there was a big improvement mostly due to Frank 
Rosenblatt' s work. He created the first perceptron, a connection scheme of a neural 
network, and he developed also an iterative algorithm to correct the errors arising during 
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the solving process. 

In the same period, Widrow and his student Hoff, developed a similar algorithm but 
more effective, called delta rule or ADALINE. The idea was to give a measure of the error 
and then to adapt the neurons connections consequently, estimating the gap between the 
neural network response and the exact response. 

Afterwards, while in the second half of sixties Minsky and Papert analysis on per- 
ceptron model limitations reduced the initial enthusiasm towards artificial network, a 
renovated interest in this research field occurred in the eighties, with the worth work 
led by John Hopfield, who compared the functioning of a neural network to a physical 
system, and, in Japan, with the Fukushima work, who proposed a network for character 
recognition, later developed and modified. 

The full affirmation of neural network field arrived in 1986, when the back-propagation 
algorithm was published. This study actually represented the reply to Minsky and 
Papert's criticism, since it gave a recursive method, based on the delta rule, to modify 
neural network synaptic values. 

Since them till nowadays neural networks have known a great improvement and 
development in a lot of research fields. 

2.2 Neural networks structure 

In this section we analyse some common characteristics of a generic artificial neural net- 
work (For an extended treatment look at [3]). In the following sections we concentrate on 
Radial Basis Function Neural Networks (RBFN) [4]. 

2.2.1 Architecture 

As we said in the introduction to this chapter, artificial networks are composed by units 

or neurons. Every neuron is characterized by an activation threshold and by an activation 

function. 

The connection between two neurons is called synapse. 

In the following we indicate with x = {x\, ...,Xj, ...Xn} the input vector and with y = 

{yi, ...,yi, ■■■y n } the output vector (we suppose they do not necessarily have the same 

cardinality), so that we can write the response of the i-th neuron as: 

N 

y i = §{A) = §{Y,u ij x j -O i ) (2.1) 

i=i 

where we have supposed x composed by TV components and where $ is the activation 
function, 6 is the activation threshold vector and w is the vector of weights associated 
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to every synapse. The weights' vector changes during the learning phase to adapt the 
network and can assume positive or negative values. 

The activation function's choice determines the different responses of the net. We list 
some examples: 

• Staircase function: 

•M) = (i , A> " (2-2) 

(_ U otherwise 

_ J 1 A > 9 

\ — 1 otherwise 

In both cases the neuron can transmit only a bit of information. 

• Linear function 

$(A) = kA (2.4) 

• Sigmoid function 

•< A > = TT^ (2-5) 

where the constant fc controls the curve steepness; for instance if k — ► oo then the 
sigmoid function tends to the staircase function. 

For what concern network's architecture, what distinguishes a network from another is 

the number of input and output units. Using this as criterion, we can classify networks 

in hetero-associative or auto-associative. 

In hetero-associative nets input units are completely distinct from the output vector in a 

way that, for example, they can differ in the number of units which compose input and 

output. 

In auto-associative nets there is a unique layer of input-output such that all the units are 

connected together. This auto-referring feature entails the possibility of an instantaneous 

control of the exchanged information between input and output, and that's the reason 

why these nets are used to predict time dependent processes. 

Another distinguishing feature is the number of hidden layers. The most common 
architecture is a multi-layer architecture in which there is more than one hidden layer. 
In this structure, usually, the information travel from the lowest layer to the output. 
This kind of transmission is called feed-forward. 

2.2.2 Learning Phase 

One important step in the use of an artificial neural network is the learning phase. We can 
make a distinction between supervised learning, where you have a measure of error based 
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on the difference between expected response and the actual output, and unsupervised 

learning. 

In the first scheme, you exactly know the input pattern and the output pattern, thus this 

kind of learning involves different algorithms to measure the goodness of the response. In 

the latter, the main difference is that the desired response is not known or determined 

but there are a series of conditions the output should satisfy This kind of network can 

learn from the environment and can use some a priori knowledge to find a solution to an 

unknown pattern with good approximation. 

Anyway, all the learning algorithms have some common features: 

1. the initial values of the weights are chosen randomly in a small range (for example 
among —0.1 and 0.1) or are set to zero. 

2. The learning phase consists in the repeated presentation of training patterns. The 
weights are modified after the presentation of every pattern or after all the patterns 
have been shown. The new value of the weights is given by: 

^ = ^ + A^ (2.6) 

that means we have to determine only Aw*-. 

3. When the training is over, the weights' values are stored and one can test the network 
on unknown pattern (Generalization phase). 

2.3 Radial Basis Function Networks 

A Radial Basis Function Network is composed, as in the formulation given by Broomhead 

and Lowe [5], by an input layer, an output layer and by a unique hidden layer. 

The mathematical transformation from the input layer to the hidden layer, i.e. the 

activation function $ (see 3.5), is always non- linear, while the transformation from the 

hidden layer to the output (the function A) is linear. Thus, we are allowed to use radial 

basis functions as activation functions. 

The motivation for choosing a non-linear activation function is that the network can 

learn non-linearly separable transformations. We give a short explanation of this 

statement in the following. 

An example which involves non-linearly separable transformations is the problem 
of patterns classification. This means that, given a rule to classify (i.e. to distinguish a 
pattern from another) patterns, the network should be able to divide the input patterns 
into classes of similar patterns, on the base of the rule established from the beginning of 
the process. 
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The learning phase for this kind of problems consists in the creation of a separation 
vector for the input space, such that in the output every pattern is in its own class. 
If we imagine patterns as points in the space which are divided in two classes on the 
base of their coordinates, the problem of classification is to find a line or a plane or 
an hyperplane such that points in the first class lay on one side of the line or plane or 
hyperplane and points in the second class lay on the other side. If the separation line or 
plane or hyperplane exists, the problem is said to be linearly separable. 
The main theorem on the separability problem is Cover's Theorem, which also provide a 
justification for taking a high- dimensional hidden unit. 

Theorem 2.1 "A complex pattern- classification problem, cast in a high- dimensional 
space non-linearly , is more likely to be linearly separable than in a low- dimensional space. " 
(Cover 1965) 

We can state this theorem in a different way by saying : 

Consider a pattern of points x G X, each of which is assigned to a region of the space 
X + or X~ . We called this subdivision a dichotomy (binary partition). We say that the 
dichotomy is separable if there exists a surface separating the two classes of points X + 
and X~ . 

For every pattern x G X, consider now a set of real- valued functions 

{y?i(x)|i = l,...,m 2 } such that v?(x) = [^i(x),^ 2 (x),...,(/7 m2 (x)]. (2.7) 

If the pattern vector lives in a mi-dimensional space, every 

<p(x) = (^i(x),^ 2 (x),...,^ m2 (x)) (2.8) 

is a map from the m-i-dimensional input space into an m 2 -dimensional space. Function 
(fi is said to be an hidden function because it plays a role similar to the one a hidden unit 
plays in a feedforward neural network. 

Definition 2.1 A dichotomy {X + ,X~} of X is (p-separable if there exists an mi- 
dimensional vector w such that we may write: 



r . , .. _ ,-_ (2-9) 



w T ip(x) > x G X + 
w T (p(x.) < x G X 

The expression 

w>(x) = (2.10) 

define the separating hyperplane in the hidden space. The inverse image of this hyperplane 

x:™ T ^(x) = (2.11) 

defines the separating surface in the input space. 



2.3. Radial Basis Function Networks 



31 



This means that if <p maps the input pattern into a high dimensional space, then the 
problem can become linearly separable. 

Nevertheless, we have to observe that it is not always necessary to map the data 
in high dimensional space. Sometimes the use of a non-linear hidden function is sufficient 
to obtain the separability, as it is shown in the following example. 

The most famous case of non-linear separation problem is the representation of the XOR 
logic function. In the XOR problem there are four input points: 

(0,0), (0,1), (1,0), (1,1) 



The requirement is to create a classifier pattern which gives response for the inputs 
(0, 0), (1, 1) and 1 for the remaining. 

This problem cannot be solved by a single perceptron, i.e. by a neural network without 
hidden layers for which the output is given by 



N 

E 



x jWj 



(2.12) 



Indeed, a linear neural network can only solve linear separation problem like the repre- 
sentation of the AND logic function. 

We can show this with a figure: 
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Figure 2.2: Output from linear neural network 
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Figure 2.3: Output from non-linear neural network with hidden layer 



We build explicitly the network. 
Consider the two Gaussian functions: 

Pi (a?) = e" 
(p 2 (x) = e 



\\x—t\ 



\\ x ~ fell 



*1 

t-2 



[1,1] T 

[0,Cf 



(2.13) 
(2.14) 



The input patterns are mapped onto the plane (fi(x) — <fi2{x) such that the points 
(0, 0), (1, 1) are linearly separable from the other even if we have not mapped the points in 
a high dimensional space. In other words, non-linearity exemplified by the use of Gaussian 
hidden function is sufficient to transform the XOR problem into a linear separable one. 

From the discussion above we have learnt that a non-linear mapping is useful to transform 
a non-linearly separable problem in a linearly separable one. Now if we also consider that 
the function from the hidden layer to the output is linear, we can represent the network 
as the following application: 



s(x) : 



(2.15) 



We may think of the map s as an hypersurface in T C R mi+1 . Thus T is a multidimensional 
plot of the output as a function of the input. In a more general situation the data are 
contaminated by noise and the surface is usually unknown. So the learning phase and the 
generalization phase can be viewed as the following steps [5]: 

• the learning phase is the optimization of a fitting procedure for the surface V based 
on the presentation of the input and output patterns composed by the known data 
points. 

• Generalization is synonymous with interpolation among the data points. The in- 
terpolation is made along the constrained surface generated by the previous fitting 
procedure as the optimum approximation to true T surface. 
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This way we are led to a multivariable interpolation problem in a high dimensional space 
that can be stated as follow: 

Given a set of N distinct vectors (data sites) 

{xiGR mi |j = l,2,..,iV} 

and N real numbers {di\i = 1,2, ..., m}, choose a function s : M, N — ► R which satisfies the 
interpolation condition 

s(x 4 ) = d 4 (2.16) 

As we know from the previous chapter the radial basis function technique consists in 
choosing a function s such that: 

TV 

S (x) = J>^(||x-xJ) (2.17) 

where the function (p is a radial basis function. 

We can see that in order to obtain the weights that define the network, we are 
led to solve an interpolation problem. Practically speaking, this means that we have to 
investigate whether the interpolation matrix is singular or not. 



Chapter 3 

Simultaneous approximation of 
function and its derivatives 



In this chapter, referring to May-Duy and Tran-Cong papers (see [6, 9] for more details), 
we present a method to approximate simultaneously a function and its derivatives of any 
order, through the use of a RBF network. Then we apply this method to solve a system 
of ODE. 

3.1 Direct and Indirect Methods 

Let us take a function / : fl C R s — ► R. We want to approximate this function with 
positive definite radial basis functions. 

The functions we are interested in are Gaussians, Hardy Multiquadrics and Hardy Inverse 
Multiquadrics, which from now on we indicate with Multiquadrics and Inverse Multi- 
quadrics. Differently from the introductory chapter, in what follows, we consider a shape 
parameter strictly positive, such that: 



(3.1) 
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Inverse Multiquadrics 














$( r ) = 


= ¥>(l|x- 


-0,11) = 
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y/r* 


+ a? 



(3.2) 



(3.3) 
a i 

where a, is the width of the radial basis function, defined as 

Oj = j3di (3.4) 

(3 is a factor strictly positive and d, is the distance from the ith center c, to the nearest 

neighbouring center. 

With these information the RBF network can be written as: 
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s(x) = y^^y?(iix- d 



(3.5) 



i=l 



In order to determine the weights by using the general least square method (this corre- 
spond to the training of the network), we should impose the following conditions: 



/(xj) = s(xj) Xj eQ, j = 1, ...,n 
Thus, we are led to the following n x m system of linear equations: 



(3.6) 



/V(||xi-Ci||) ^(||xi-c 2 ||) ^(||x 1 -c 3 | 
V(||x2-cij|) </?(||x 2 -c 2 ||) ¥?(j|x 2 — c 3 | 



v(ll x l 

y?(||x 2 



c, 

c, 



\(^(||x n -ci||) ¥?(||x n -c 2 ||) v?(||x n -c 3 ||) ••• v?(||x n -c r 



w; 2 



2/2 



\WmJ \y n J 



In what follows we indicate with G G ]R nxm the matrix of the system and with w G K m 

the column vector of the weights to be determined. 

If we use the least square method to compute the weights Wi, we obtain the following 

system: 

(G T G)w = G T y (3.7) 

Moreover, if we consider a number of centers which equals the number of data sites (i.e 
m = n) the problem becomes an interpolation problem and so we are led to the square 
system: 

Gw = y. (3.8) 

Now we can easily compute the approximation of the function through 



/( x i) ~ ^2wM\\ x j ~ Ci 



Vxj E fl, j = 1, ...,n 



(3.9) 



i=i 



and the approximation of the derivatives simply differentiating the radial basis function 
network 



gk f 



dxj ■ ■ ■ dxi 



h..fa) « E 



Wi- 



d h ip 



dx-j ■ ■ ■ dxi 



(3.10) 



The authors called this method DRBFN, that means Direct Radial Basis Function Net- 
work. They tested the method on the following scalar valued function (i.e. fid) 



f(x) = x 3 + x + 0.5 -3<x<2 
choosing the parameter (3 = 2. 



(3.11) 



3.1. Direct and Indirect Methods 
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For this function we show the approximation procedure of the function itself and of its 

first and second derivatives. 

The Matlab script MayDRBFNtest.m tests the network on 250 equally spaced points 

using the weights computed by the Matlab function MayDRBFNtrain.m, which uses 50 

equally spaced points as centers and as data sites; indeed the centers coincide with the 

data sites. 

In Figure 3.1 we show the plots of the approximation by using the DRBF network, while 

in Table 3.1 the /2-error. 

Table 3.1: Error of DRBFN method 





Function 


First Derivative 


Second Derivative 


G 


6.4e - 01 


2.8e + 01 


Lie + 03 


IM 


7.2e-01 


3.0e + 01 


l.le + 03 


M 


9.2e - 02 


3.9e + 00 


1.5e + 02 



As we can see from the plots and from the error table, the approximation is not good. 
This is not surprising since the differentiation process is very sensitive to even a small 
level of noise. What we expect is that if instead of differentiating we integrate we should 
have a method less sensitive to noise. For this reason we can look for an indirect method. 
In fact, better results can be obtained by the following indirect method that we call 
IRBFN, i.e. Indirect Radial Basis Function Network. The key idea of the method is that 
we can simultaneously approximate a function and its derivatives up to order n, simply 
approximating the nth derivative with (3.9) and then computing n integrations. This 
means: 

; (n) (x)^E™i^(x) 

/(n-l) (x ) = f/(n)( x)dx ~ E™^ f $( x )dx = YZl W i H \*) + Cl 

(3.12) 



/(x) = / /«(x)dx « YZ 1 ^# n (x) + Cxx"" 1 + C 2 x" 



G„ 



As an example, we applied this indirect method to the approximation of the function 
(3.11) and of its first and second derivatives. 

Clearly, in this example we have to estimate the weights and two constants C\ and Ci- 
This leads to a non-square system (since we have two more columns). In order to obtain 
a square system we may use a number of centers such that: centers = datasites — 2. In 
this way, the last two rows of the matrix G provide the equations that allow to compute 
the constants C\ and C2. 
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Figure 3.1: Approximation with DRBFN 



(a) Function Approximation with DRBFN 

Approximation of the function on 250 target points uniformly spaced 
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(b) First Derivative Approximation with DRBFN 

Approximation of the first derivative on 250 target points uniformly spaced 




target points 



-1 -0.5 

target points 



(c) Second Derivative Approximation with DRBFN 

Approximation of the second derivative on 250 test nodes uniformly spaced 




target points 



3.2. Numerical Experiments with other RBF 



39 



The indirect method implementation is in the Matlab script MayIRBFN2test.m which 
uses the function MayIRBFN2train.m to compute the weights. We show the plots corre- 
sponding to the approximation obtained by the indirect method in Figure 3.2, while the 
error table is Table 3.2. 

Table 3.2: Error of IRBFN method 





Function 


First Derivative 


Second Derivative 


G 


2.0e - 04 


Lie -02 


6.2e - 01 


IM 


2.0e - 04 


Lie -02 


6.0e - 01 


M 


1.8e-05 


9.2e - 04 


5.0e - 02 



3.2 Numerical Experiments with other RBF 

The following subsections describe a series of further numerical experiments we made 
with other RBF to better understand the functioning of the previous methods (DRBFN, 
IRBFN). 



3.2.1 Multiquadrics 

Multiquadrics are conditionally positive definite of order k = max(0, [/?]). Since in our 

case /3 = |, we have k = 1. 

As we discussed in the first chapter, in order to guarantee the non-singularity of the 

interpolation matrix, it is necessary to add a polynomial of degree k— 1 to the interpolation 

formula (see 1.27). 

However, in the specific case of generalized multiquadrics the following theorem holds (see 

[1], chapter 9, page 77): 

Theorem 3.1 Suppose $ is strictly conditionally positive definite of order one and that 
$(0) < 0. Then for any distinct points x 1; ...,x n G M s the matrix A with entries Ajj, = 
$(xj — Xfc) has n — 1 positive and one negative eigenvalue, and is therefore non-singular. 

What does it happen if, instead of using the previous theorem, we use the following scheme 
to approximate the function (3.11) and its first and second derivatives? 



f'(x) = J2Zi^H\x) + C 1 x + C 2 
fix) = YZi ^H\x) + C lX 2 + C 2 x + C 3 



(3.13) 
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Figure 3.2: Approximation with IRBFN 



(a) Function Approximation with IRBFN 

Approximation of the function on 250 target points uniformly spaced 



(b) First derivative Approximation with IRBFN 

Approximation of the first derivative on 250 target points uniformly spaced 
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(c) Second derivative Approximation with IRBFN 

Approximation of the second derivative on 250 target points uniformly spaced 




target points 
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Moreover, to compute the weights we have to add the conditions: 



(3.14) 



Figure 3.3: Experiment with Multiquadrics with C\ ^ 
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target points 



(b) 



(c) 



Approximation of the first derivative on 250 target points uniformly spaced 



Approximation of the second derivative on 250 target points uniformly spaced 





target points 



target points 



From Figure 3.4 we can see that the solution is accurate but differs from the original one 
for a linear polynomial, i.e. an aliasing phenomenon occurs. This is significant of the 
fact that the constant C\ is superfluous, i.e. should be chosen equal to zero. Indeed, in 
this case, still considering the conditions on the weights, we obtain a good approximation. 
Table 3.3 shows the /2-error in both cases. 
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Table 3.3: Error of IRBFN method 





Function 


First Derivative 


Second Derivative 


M 


1.8e-05 


9.2e - 04 


5.0e - 02 


M with d ^ 


2.0e-05 


3.6e + 01 


2.4e + 01 


M with d = 


6.9e - 04 


3.0e - 02 


1.2e + 00 



Figure 3.4: Experiment with Multiquadrics with C\ = 

(a) 



Approximation of the function on 250 target points uniformly spaced 




target points 



(b) 



(c) 



Approximation of the first derivative on 250 target points uniformly spaced 



Approximation of the second derivative on 250 target points uniformly spaced 





target points 



target points 
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3.2.2 Thin Plate Splines 

As we did in the previous section, we apply the indirect method (IRBFN) to approximate 
the function y(x) = x 3 + x + 0.5 in the interval —3 < x < 2 using Thin Plate Splines 

$(x) = ||x|| 2/3 log||x|| xel s /3eN 

Choosing /3 = 1, the function becomes strictly conditionally positive definite of order 

k = 2. Hence Theorem 3.1 is not valid any more. 

Since k = 2, we should use the following formulation to solve the interpolation problem. 

f'(x) = YZi «>il|s|| 2 lo § IMI + Cix + c 2 

f'(x) = YZi «^IM| 3 (31og ||x|| - 1) + C x x 2 + C 2 x + C 3 (3.15) 

f(x) = TZi ^Tiilkll 4 (12 log ||x|| - 7) + Cn 3 + C 2 x 2 + C 3 x + C 4 

To compute the weights we have to add the conditions (see section 1.2 for further expla- 
nation): 

(3.16) 

As we can see from Figure 3.5 the behaviour of the approximation is similar to the previous 
case of multiquadrics. Thus, we tried, to put the constants C\ and C 2 equal to zero, still 
considering the conditions on the coefficients (3.16). Table 3.4 shows the corresponding 
/ 2 -errors. 

Table 3.4: Error of IRBFN method 





Function 


First Derivative 


Second Derivative 


TPSCi^0,C 2 ^0 


6.4e - 06 


l.le + 02 


1.2e + 02 


TPSC , 1 = 0,C 2 = 


3.7e-03 


7.0e + 02 


6.0e + 00 


M 


1.8e-05 


9.2e - 04 


5.0e - 02 



3.2.3 Wendland 

Till now we have always used globally defined radial basis functions. In this section 
we illustrate some results obtained using compactly supported radial basis functions. In 
particular we refer to Wendland functions. 
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Figure 3.5: Experiment with Thin Plate Splines with C\ ^ 0, C2 7^ 

(a) 



Approximation of the function on 250 target points uniformly spaced 




(b) 



(c) 



Approximation of the first derivative on 250 target points uniformly spaced 




Approximation of the second derivative on 250 target points uniformly spaced 




target points 



target points 
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Figure 3.6: Experiment with Thin Plate Splines with C\ = 0, C2 = 

(a) 



Approximation of the function on 250 target points uniformly spaced 




(b) 



(c) 



Approximation of the first derivative on 250 target points uniformly spaced 




Approximation of the second derivative on 250 target points uniformly spaced 




target points 



target points 



46 Chapter 3. Simultaneous approximation of function and its derivatives 

3.2.3.1 Wendland functions construction 

At first we have to notice (see [1] for further explanation) that compactly supported 

functions $ which are truly strictly conditionally positive definite of order k > do 

not exist. Indeed, the compact support automatically ensures that $ is strictly positive 

definite. 

Another observation is that compactly supported radial functions can be strictly positive 

definite on M. s only for a fixed maximal s-value. Indeed, a function is strictly positive 

definite and radial on 1R S if its s-variate Fourier transform is non-negative. 

After this two considerations, we can introduce an integral operator and its inverse, a 

differential operator, which can facilitate the construction of compactly supported radial 

functions. 

Definition 3.1 Let ip be such that t — ► tip(t) G Li[0,oo). Then we define the integral 
function operator I via 

POO 

(I(p)(r) = / t(p(t)dt r > (3.17) 



For even (p G C 2 (1R) we define the differential operator D via 

(D(p)(r) = — (p'(r) r > (3.18) 

r 

These operators allow us to construct new strictly positive definite radial functions from 
given ones by a "dimension walk" technique that steps through multivariate Euclidean 
space in even increments. 

One of this family of compactly supported functions are the Wendland functions. Wend- 
land starts with the truncated power function ipi(r) = (1 — r) l + and then walks through 
dimensions by repeatedly applying the operator /. 

Definition 3.2 With <fi(r) = (1 — r) l + we define 

ip s ^ = I <Pls/ 2 \+k+l- (3.19) 

It turns out that the functions ip s ^ are all supported on [0, 1] and have a polynomial 
representation there. Indeed, 

Theorem 3.2 The functions (p s ^ are strictly positive definite and radial on 1R S and are 

of the form 

__ f p a , fc (r), re [0,1] 
Vs,k - I Q) r>1 (3.20) 

with a univariate polynomial p s ^ of degree \_s/2\ + 3k + 1. Moreover, p s ^ G C 2fc (lR s ) 
are unique up to a constant factor, and the polynomial degree is minimal for given space 
dimension s and smoothness 2k. 
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Some examples: 

¥>3,i = ±(1 - ar)%(4ar + 1) 
cp 3t2 = i(l - r)5.(35aV + 18ar + 3) (3.21) 

^33 = A(i _ or )8_(32a 3 r 3 + 25a 2 r 2 + 8ar + 1) 

3.2.3.2 Numerical experiments with Wendland functions 

As we did in the previous sections we approximate the polynomial f(x) = x 3 + x + 0.5 
in the interval [—3,2], using 50 equally spaced points as centers and 250 equally spaced 
points as target points. 

For our purposes, we use the Wendland ip^i which is C 2 in the origin, and we differentiate 
it two times in order to approximate the first and the second derivative. So, we have: 

^3,1 ( r ) = (1 - or)^(4ar + 1) 
f^3,i(r) = -20o 2 r(l-or) 3 f 
J^3,i(0 = 20a 2 (l - ar) 2 + (4ar - 1), 

We also use the Wendland c/^2 which has order C 4 at the origin: 

(p 3a (r) = (1 - ar)^(35a 2 r 2 + 18ar + 3) 
£<P3,2(r) = -56a 2 r(5ar + 1)(1 - arf + 
£s<P3,2(r) = 56a 2 (l - ar)^_(35a 2 r 2 - 4ar - 1), 

and Wendland (^33 which has order C 6 at the origin: 

p 3)3 (r) = (1 - ar) 8 + (32a 3 r 3 + 25a 2 r 2 + 8or + 1) 
1-^33^) = -22o 2 r(16a 2 r 2 + Tar + 1)(1 - ar) 7 + 
£z<p 3t3 (r) = 22o 2 (l - or)^(160o 3 r 3 + 15a 2 r 2 - 6ar - 1). 

The implementations are in the Matlab scripts training. m and testnetwork.m. We have to 

notice that we use a different shape parameter for the Multiquadrics and for the Wendland 

functions, i.e. in both cases we have di = 0.102, but /3 = 2 for Multiquadrics and /3 = 4 

for the Wendland functions. 

What is more, to build the matrix for the Wendland we use the specific Matlab function 

DistanceMatrixCSRBF _new.m which gives a sparse matrix. Moreover, the greater the 

width parameter, the sparser the matrix. 

In Figure 3.7 we display the plots while in Table 3.5 the / 2 -errors: 
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Table 3.5: Error of DRBFN method 





Function 


First Derivative 


Second Derivative 


M 


9.2e - 02 


3.9e + 00 


1.5e + 02 


¥>3,1 


1.5e-01 


1.8e + 02 


2.6e + 02 


^3,2 


4.6e - 02 


4.5e + 02 


8.1e + 01 


^3,3 


7.7e - 03 


2.7e + 03 


1.5e + 01 



Figure 3.7: Direct Method using Wendland function as basis 



(a) Matrix Sparsity 




(c) First Derivative Approximation 

Approximation of the first derivative on 250 target points uniformly spaced 
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(b) Function Approximation 

Approximation of the function on 250 target points uniformly spaced 
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(d) Second Derivative Approximation 



Approximation of the second derivative on 250 target points uniformly spaced 
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We also try to apply the indirect method (IRBFN). We integrate y?3,i, <f3.2, ^3,3 and ^4. 
Table 3.6 shows the /2-error. 
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Table 3.6: Error of IRBFN method 





Function 


First Derivative 


Second Derivative 


M 


1.8e-05 


9.2e - 04 


5.0e-02 


¥>3,1 


1.9e-02 


1.5e + 04 


7.8e-01 


^3,2 


3.7e - 03 


3.4e + 04 


1.6e-01 


^3,3 


Lie -03 


9.8e + 04 


5.0e-02 



Figure 3.8: Indirect Method using Wendland function as basis 



(a) Matrix sparsity 



(b) Function Approximation 

Approximation of the function on 250 target points uniformly spaced 
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target points 



(c) First Derivative Approximation 

Approximation of the first derivative on 250 target points uniformly spaced 
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(d) Second Derivative Approximation 



Approximation of the second derivative on 250 target points uniformly spaced 




target points 



target points 



In this case it seems, from a trial and error approach, that the best choice for the shape 
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parameter is a, = 0.2 • 0.102. This choice leads, as we can see from Figure 3.8 to a dense 

matrix. 

This behaviour, can be explained by observing that in the indirect method, it is necessary 

to add two more columns to the matrix which computes the weights, with the aim to 

estimate the two integration constants C\ and C<i (see 3.12). 

The columns' addition breaks the sparse structure as we show in the following figure: 



Finally we observe that the worst approximation is obtained in evaluating the first deriva- 
tive, both for Thin Plate Splines and Wendland functions. 

This is due to the fact that the integral and differential operators, which define Wendland 
functions, allow a "dimension walk" that steps through multivariate Euclidean space 
in even increments. Thus, it is only through the double application of the differential 
operator that we obtain a function that is still in the Wendland native space. 
The same is true also for Thin Plate Splines which are defined only for parameters j3 G N, 
i.e. 

$(x) = ||x|| 2/3 log||x|| xgM s /3gN. 

This means that we can always obtain good approximation for even derivatives but not 
for the odd ones. 



3.2.4 Considerations on Native spaces 

In section 1.3 we have listed the native spaces for some radial basis functions. In particular 
we saw that Wendland's compactly supported functions have native spaces 



N* 



') = W, 



s/2+k+l/2 {m>s -. 



while Thin Plate Splines have the so-called Beppo-Levi space of order k: 
BL k = {/ G C(R S ) : D a f G L 2 (R S ) for all |a| = k, a G N s } . 



(3.22) 



(3.23) 
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In order to obtain the error estimates in (3.26) and (3.27) we state some preliminaries 
results (proofs are in [7, 8]). 

Theorem 3.3 Suppose that ft C 1R S is bounded and satisfies an interior cone condition. 
Let I E No and a E Nq with \a\ < I. Then there exist constants ho, c[ , c^ > such 
that for all X = {xi, ...,xjv} C fl with hx,ci < h and every x E fl there exist numbers 
u" (x) , . . . , M^r (x) with 

1- Ef=i u«(x.)p( Xj ) = D a p( X ) for allpE ^(M 8 ), 

*■ E7 =1 i^(x)i< c S Q ^-g, 

3. u?(x) = 0, if ||x - Xj-Ha > c { ^hx,n. 

The next result deals with (conditionally) positive definite functions. 

Theorem 3.4 Suppose that $ E C k (M, s ) is conditionally positive definite of order m. 
Suppose further that fl C M s is bounded and satisfies an interior cone condition. Fix 
I > m — 1. For a G Nq with \a\ < k/2 and X = {x l7 ...,xjv} C fl satisfying hx,n < ^o the 
power function can be bounded: 



[P° x (x)] 2 < |D 2Q $(0) - D 2a P (0)\ 

Y I 

II 00 (B(0,4 0, /i x , n )) 
'-V.J2 II'- 1 ' _ ^llL 0o (B(0,24 a) /i J r, n )) 



2 C ; Q) / i -giiD°$-i ? >| lr ; _„ ;/ 



4 Q) ] 2 ^ I |I$-PII,^ B ,,^),.., (3-24) 

where p is an arbitrary polynomial from 7T;(M S ) and the constants ho,c\ , c^ come /rom 
Theorem 3.3. 

Now we can state a generic error estimate 

Theorem 3.5 Suppose that $ G C^(1R S ) is conditionally positive definite of order m. 
Suppose further that fl E M. s is bounded and satisfies an interior cone condition. For 
a E Nq with \a\ < k/2 and X = {x l7 ...,xjv} C fl satisfying hx,n < h we have the error 
bound 

||D°/(x) - D"P f (x)\\ Loo (fl) < Ch$£ )/2 -W\\f\\ N . (a) (3.25) 

Now we can apply the general result of Theorem 3.5 to Wendland and Thin Plate Splines 
functions. 

• Wendland Function. 

|T/7(x) - D°P f {x)\ < C7# n Ha| ||/||*. (n) (3.26) 

for every \a\ < k, hx,n sufficiently small. 

Remark. We have to notice that in the case of ip^i we can use this estimate only 
for the function and first derivative, since lal < k. 



Wp(Sl) - [E\ a \<m\\ Da f\\ P J 
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• Thin Plate Splines. 

|ZT/(x) - D a P f (x)\ < Ch k x -W\f\ N , m (3.27) 

for every \a\ < k — 1, hx t n sufficiently small. 

In order to obtain the error estimates we should also compute the Sobolev norm and 
Beppo-Levi semi-norm of the function /. 

\L p (a)J 

(3.28) 

\f\BL k = [z2\ a \=m ^\\D a f\\ Lp (n)) 

A function that belongs to these spaces is the gaussian e^~ x \ For this function we 
compute an estimate of the constant C in (3.26), by the following computation: 

= \D°f(x)-D°P f (x)\ 

,fe+|-H ||f|| l ' ; 

n x,n II/IIjv* 

By considering the maximum of C we obtain a first estimate of the error. We also try to 
take the mean value of vector C, but this leads to underestimate the error. 
We display the error plots for (p^i, ip^2 and ^3 in Figure 3.9 and 3.11, where in black is 
the error estimate when C assumes its maximum value. 

In the case of Thin Plate Splines, since we are considering k = 1 the (3.27) is valid only 
for the function approximation. For the estimate of the constant C we proceed as in the 
previous case, by considering the following formula: 

_ \D°f(x)-D°P f (x)\ 

~ h k ~^\f\ [ } 

n x,n l/liv* 

In Figure 3.11 we display the error trend for the function approximation by using Thin 
Plate Splines. The black points are the error estimate obtained by taking the maximum 
value assumed by C. 

On the other hand, it is quite easy to see that the polynomials are not in these two spaces. 
So the previous results are not valid. However, the following theorem holds: 

Theorem 3.6 Let k and n be integers with < n < k < r and k > s/2, and let f G 
C h (Cl). Also suppose that X = {xi,...,xjv} C ft satisfies diam(X) < 1 with sufficiently 
small fill distance hx,n- Then for any 1 < q < 00 we have 

\f - PfW m < C pF fc ^" s(1/2 " 1/9)+ ll/ll^ ( n), (3.3i) 

where px = — is the mesh ratio for X and qx is the separation distance \ min^- ||xj — 

Xj\\ 2 . 
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Figure 3.9: Error trend using Wendland function as basis 
(a) error function with tp^\ (b) Error first derivative with 
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(c) Error function with ipsp 
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(e) Error second derivative with tps^ 
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Figure 3.10: Error trend using Wendland function as basis 
(a) Error function with ^3 (b) Error first derivative with ^3^ 
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(c) Error second derivative with ^3 
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Figure 3.11: Error trend using Thin Plate Splines function as basis 




In the case of Thin Plate Splines, choosing r = 2/3 and k = r, n = 0,q = oo we arrive at 
the bound: 



\f-Pf\ Loo <ch 



2/3-s/2 

x,n 



C-2/3 



(fi). 



As we did for the function e x we can estimate the constant c as: 

\f - Pf\Loo 



hTn /2 \\f\\MCl) 



(3.32) 



(3.33) 



Thus we obtain: 



Table 3.7: Constant estimate for Thin Plate Splines 



TPS Constant 1.8e - 03 



For Wendland functions we can choose r = k (where k is the order of the Wendland 
function in the notation <p a ,k) an d n = 0, q = oo. In this way we are led to: 



1/ - Pf\ Laa < ch 



k-s/2 

x,n 



c k (ti)- 



(3.34) 



As for Thin Plate Splines we estimate the constant c: 



Table 3.8: Constant estimate for Wendland 



(fs t i constant 


6.7e- 


-03 


(/9 32 constant 


2.9e- 


-02 


(ps,3 constant 


8.9e- 


-02 
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3.2.5 Network existence and localization properties 

In this section, we first state an existence result for the RBF network for simultaneous 
approximation of a function and its derivatives under some conditions on the kernels and 
the function to be approximated (see [12]). Then, we discuss localization properties for 
different kernels. 

Let Z s denote the set of all multi-integers in R s . For k = (ki, ..., k s ) G Z s , let \k\ = 
Yl S j=i \kj\ and for m = (mi, ...,m s ) G Z s we say that k < m, if kj < rrij for all j, 
!<3<s. 

We denote with D m f the mth order partial derivative of a multivariate function /. Clearly 
in the univariate case D m f = f( m \ 

Then, we denote with C^ mi '-' m "\K) = C mi (K) n ... n C m «(K) the set of all function / 
such that for any k G J(mi, ...,m q ), D k f is continuous on an open set containing the 
compact K, where ni; G Z s + and J(nii, ...,m q ) is the set of all k G N s such that k < rm 
for some I, 1 < I < q. 

Finally, let s(x) = YlT=i c jV ? (^jll x ~~ t j 1 1 ) , be the network for the simultaneous approxi- 
mation, where Cj G R, Xj > 0, tj G R s and <p(\\ ■ ||) is the radial basis function. 
The following theorem holds: 

Theorem 3.7 Suppose that a univariate function (p is analytic in (—r,r), where r > 0, 
and for some integer p > 1, (p( p: >\0) ^ for j > 0. Then for any f G C , ^ miv "' m<? - ) ([— 7r,7r] s ), 
and any e > 0, there is a RBF neural network 



m 



6 '( x ) = z^ c M x j\\ x - t j\\) ( 3 - 35 ) 

where Cj G R, tj G [— 27r, 27r] s ; and < Xj < r/(9-7r 2 s), for 1 < j < m, such that 

||D k /-^ k s|| Loo([ _ 7r ^ ) < e (3.36) 

for all k G J(mi, ..., m q ). 

We first remark that most RBFs studied in literature, such as Gaussians, Multiquadrics 
and Inverse Multiquadrics satisfy the assumptions of Theorem 3.7. 

Moreover, by considering the convolution operator and the introduction of mollifiers, we 
may extend the theorem to every compact set: 

Corollary 3.8 Let ip be assumed as in Theorem 3.7. Suppose that K is a compact subset 
o/R s , and f G C^ mi ''"' mq \K). Then for any e > 0, there is a network s in the form of 
equation (3.35) for some suitable Cj G R, Xj G R, and tj G R s , where 1 < j < m, such 
that 

\\D k f-D k s\\ LooiK) <e (3.37) 

for any k G J(mi, ..., m g ). 
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Finally, we observe that the density results can also be established for the L p norm, i.e. 

even if the function / is defined in a Sobolev space. 

Clearly, the RBF network defined in the first section of this chapter, is a particular case 

of the previous, since can be obtained by choosing Xj = 1 and x = t,-. 

Referring to [18, 14], we comment the numerical results of the previous section, in which 

we have seen that better approximations are obtained by using Multiquadrics. 

Consider the following quasi-interpolation formula: 

s (x) = Y, /(y)^( x - y) x e Rn (s-ss) 

y€Z n 

where / : D C R n — ^ M is the function to be approximated by s and Z n is the set of 
vectors in M. n whose components are integers and if) is of the form: 

m 

^(x) = 5^ fc¥ >(||x-x fc || 2 ), x£l" (3.39) 

fc=i 

We showed that radial basis function approximations can have good localization properties 

even if ip(r) — ► oo as r — ► oo. 

Indeed, it seems that such radial functions have advantages over those ones that tend to 

zero as r — ► oo. 

Here localization means that, if D C M. n is the domain of the approximation, if s(-) is 

calculated from /(•), and if Do is any part of D, then the contribution to s(x) from 

{/(y) : y ^ A)} is small for values of x in D that are far from Dq. 

Such localization properties are highly important to the success of general approximation 

methods. 

In the one-dimensional case is quite easy to show these properties. Indeed, if we let 

(p(r) = r, and if we compute the coefficients {jik : k = 1,2, ...,m} of expression (3.39) so 

that they satisfy conditions (3.1), then s(-) is the piecewise linear interpolant and we can 

express it in the form: 

m 

s ( x ) = 5Z f( x k)^k{x) Xi <x <x m (3.40) 

fc=l 

where each ipi{-) is independent of /(•). 
For example ^(O is the hat function 

(X 3 -X 2 )|x-Xi| - (# 3 -Xi)\x-X 2 \ + (X 2 -Xi)\x-X 3 \ , , fo At\ 

ip 2 {x) = — -, : Xi < x < x m (3.41) 

2(x 2 - xi)(x 3 - x 2 ) 

In general, we have to search in the space spanned the functions 
{y(|| • — Xi\\ 2 ) : i = 1, 2, ...,m} if we wish to discover whether a particular ip(-) al- 
lows good localization properties. 
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In order to have an absolutely convergent sum for smooth ip(-) and any bounded function 
/(•), we need the following minimum localization conditions: 

/ R » |^(x)|dx< oo 

(3.42) 
/ M „'0(x)dx= 1 

Whether or not they can be satisfied when if) has the form (3.39), depends on the radial 

function tp and on the dimension n. 

In particular, in the case of multiquadrics and in the case n = 1, conditions on the 

coefficients fi^, in formula (3.39), exist such that the minimum localization conditions are 

satisfied for each choice of the shape parameter. 

Indeed, the conditions on jdk are independent of the shape parameter. 

For what concern the case of radial functions such that ip(r) — ► as r — > oo, as inverse 

multiquadrics, in [14] it has been shown that, if we require that J_ \ip(x)\dx < oo then 

we have f^° ip{x)dx = 0. 

This is undesirable, since it shows that the quasi-interpolant to a constant function has 

zero integral if if) is absolutely integrable. Thus, the approximation is no more accurate 

than s = if we try to obtain good localization properties for the cardinal functions. 

Finally, the thin plate splines are unsuitable for quasi interpolation when n = 1. 

We consider also a more recent work on the localization properties of radial basis functions 

by Fornberg ([15]). Here a different approach is used, based on studying the behaviour of 

the resulting RBF expansion coefficients, i.e. \x^ in (3.39), for increasing \k\. It has been 

shown that the leading order behaviour of the expansion coefficients for small \k\ decays 

exponentially. However, some RBF can exhibit two different decays regimes for increasing 

k. In any case, the localization properties are determined by this exponential decay. 



Chapter 4 

Solving high order differential 
equations using RBF networks 



In Chapter 3 we have studied a method to approximate a function together with its 

derivatives. In [9], it has been shown that it is possible to use this method also to 

approximate high order derivatives. This fact is very useful if we want to solve ordinary 

differential equations with high order derivatives. 

However, only the indirect method provides accurate solutions, as we can see from the 

following example. 

Consider the following differential equation: 

x A y {A) - 4rV 3) + x 2 {\2 - x 2 )y {2) + 2x(x 2 - 12)y {1) + 2(12 - x 2 )y = 2x 5 (4.1) 

in the interval 1 < x < 11, subject to the boundary conditions: 

y(l) = l + e + l 

y(ll) = 11 + 121 + 1331 + lie 11 + lie" 11 

y'(l) = 2e 

y'{ll) = 1 + 22 - 363 + 12e n - lOe 11 

The exact solution of (4.1) is: 

y(x) = x + x 2 — x 3 + xe x + xe~ x 

We applied both the direct and the indirect method, using Multiquadrics as kernel, choos- 
ing 6, 11, 17,21,50, 100 uniformly spaced centers in the interval [1, 11] and j3 = 7 '. After 
the computation of weights, we test the network on 1001 uniformly spaced target points. 
We denote with G the matrix for the computation of weights. The matrix is obtained 
substituting the network approximation of the function and of the derivatives in the 
differential equation (4.1) and in the boundary conditions. For example in the IRBFN 
case we have: 

m m+1 m+2 

4 " 



y^ Wi$(x) - 4x 3 ^2 i»iHi(x) + x 2 (12 - x 2 ) ^2 WiH 2 (x) 

i=l i=l i=l 

771+3 771+4 

2x(x 2 - 12) Y^ WiH 3 (x) + 2(12 - x 2 ) J^ Wi H 4 (x) = 2x 5 (4.2) 



i=l i=l 
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Er=tv#4(i) 
£r = t 3 «^3(i) 
Er=tv#4(ii) 
Er = t 3 ^3(n) 



I + e + i 
2e 

II + ll 2 + ll 3 + lie 11 + lie" 11 
l + 22 + 363 + 12e 11 -10e- 11 



(4.3) 



where we denote with H 1 ,H 2 ,H 3 , H^ the first, the second, the third and the fourth suc- 
cessive integrations of the kernel <f>. 

We denote the first block of matrix G with G\ 



G 1 = x i <&-4x 3 H 1 +x 2 (12 



)H 2 + 2x(a; 2 -12)H3 + 2(12 



)H, 



(4.4) 



where <t>, Hi, H2, H3, H4 are the matrices obtained from the evaluation of the functions 
<&,Hi, H 2 , H 3 , H^ in the distance matrix. 

Thus we have: 



G 



( Gl \ 

cond\ 

cond 2 

conds 

\c0nd4J 



(4.5) 



where condi, cond 2 are the vectors describing the boundary conditions for the function / 
and condz, cond^ are the boundary conditions on the first derivative. 

The error is a weighted square-error: 



ERR 



E'Um-Pf^W 



E 



TV 2 

iVi 



(4.6) 



where y is the exact solution and Pf its approximation. 

We show the result in Figure 4.1 and we give also a table with the errors and the condition 
number of the matrix G for different choices of centers. The matlab scripts DRBFNode.m, 
DRBFNodeTest.m, IRBFNode.m, IRBFNodeTest.m contains the implementations of the 
direct and indirect method for the solution of ODE respectively. The last two contain 
also the implementation of the Riley method for the study of the condition number (see 
Section 4.1) 
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Figure 4.1: Solution of ODE through DRBFN 



(a) 6 centers 

Approximation of the function on 1001 target points uniformly spaced 




(c) 17 centers 

K Approximation of the function on 1001 target points uniformly spaced 




(e) 50 centers 

Approximation of the function on 1001 target points uniformly spaced 




(b) 11 centers 

K 1 Approximation of the function on 1001 target points uniformly spaced 




5 6 7 

target points 



10 11 



(d) 21 centers 

x 1 Approximation of the function on 1001 target points uniformly spaced 




9 10 11 



target points 



(f) 100 centers 

x APP rox i ma t.i o n of the function on 1001 target points uniformly spaced 



10 11 




10 11 



target points 



target points 
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Table 4.1: Error of DRBFN method 



Number of centers 


Error 


Condition Number 


6 


2.5e + 00 


1.5e + 06 


11 


2.0e + 00 


3.2e + 07 


17 


2.5e + 00 


6.4e + 07 


21 


2.6e + 00 


7.9e + 07 


50 


2.6e + 00 


1.6e + 08 


100 


2.5e + 00 


3.7e + 08 



As we can see from Figure 4.1 and from the error table, the solution given by the direct 

method is totally inaccurate. However the condition number of the matrix G is not too 

large. 

We apply now the indirect method; plots are shown in Figure 4.2 and errors in Table 4.2. 

Table 4.2: Error of IRBFN method 



Number of centers 


Error 


Condition Number 


6 


2.5e-02 


1.8e + 14 


11 


2.2e-04 


5.5e + 12 


17 


6.2e-06 


1.7e + 13 


21 


1.2e-06 


3.6e + 13 


50 


1.4e-07 


7.7e + 14 


100 


7.4e - 08 


9.3e + 15 



4.1 Analysis of the condition number 



From now on, we focus on the analysis of the condition number of the matrix G. As 
we can see from table 4.1, the direct method is very inaccurate, but the matrix for the 
computation of weights has not a bad condition number as in the indirect case. 
How to lower the condition number? We show two possible solutions. 



4.1. Analysis of the condition number 
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Figure 4.2: Solution of ODE through IRBFN 



(a) 6 centers 

Approximation of the function on 1001 target points uniformly spaced 



-real 
- approx 




10 11 



target points 



(c) 17 centers 

K Approximation of the function on 1001 target points uniformly spaced 



- real 

- approx 




9 10 11 



target points 



(e) 50 centers 

Approximation of the function on 1001 target points uniformly spaced 




(b) 11 centers 



x 1 Approximation of the function on 1001 target points uniformly spaced 



-real 
- approx 




10 11 



target points 



(d) 21 centers 

x 1 Approximation of the function on 1001 target points uniformly spaced 



real 
- approx 




9 10 11 



target points 



(f) 100 centers 

x Approximation of the function on 1001 target points uniformly spaced 



10 11 




10 11 



target points 



target points 
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4.1.1 Scaling the domain 

The first one consists in scaling the domain in which the ordinary differential equation 
(4.1) is defined, from 1 < x < 11 to ^ < x < 1. We also changed the parameter (3, and 
we put j3 = 0.5 since we have noticed that this gives a better result. 

The implementations of the scaled direct method are contained in the Matlab scripts 
DstabOde.m, DstabODEtest.m, while the ones for the indirect method are in the scripts 
stabODE.m and stabODEtest.m. In tables 4.3 and 4.4 we display the new error values 
and new condition numbers. 

Table 4.3: Error of DRBFN method 



Number of centers 


Error 


Condition Number 


6 


1.8e + 00 


1.3e + 03 


11 


8.2e + 00 


1.7e + 04 


17 


1.2e + 00 


2.8e + 04 


21 


8.6e-01 


2.1e + 04 


50 


1.0e + 00 


1.6e + 04 


100 


1.0e + 00 


1.6e + 04 



Table 4.4: Error of IRBFN method 



Number of centers 


Error 


Condition Number 


6 


6.9e-02 


4.1e + 07 


11 


1.4e-03 


1.0e + 08 


17 


6.2e - 04 


1.6e + 08 


21 


3.4e - 04 


2.1e + 08 


50 


5.5e-04 


1.0e + 09 


100 


6.9e - 04 


4.3e + 09 



Using the scaled method we obtained a better condition number for the matrix G in 

both the cases. However, this improvement causes a loss of accuracy in the solution. This 

means that the solution given by the direct method is totally useless, while for the indirect 

method there is only a small worsening. 

We also noticed that the scaling technique works well in this case at least for two reasons: 

the first is that the definition interval of the ODE is very big ([1,11]) and so the scaling 

makes a big difference; the second is that the non constant coefficient of the equation are 

of high order. This causes the matrix G to have entrances like ll 4 and this behaviour 

leads to a big condition number. 

There is no assurance that this technique can work well in completely different cases. 
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4.1.2 Riley method and iterative methods 

The second approach we would try to use to solve the condition problem is by the Riley 

method for linear systems (see [16]). 

We illustrate how this method works. Suppose one has to solve a linear system, Ax = b 

with matrix A which is ill conditioned. 

Instead of solving this system we consider the perturbed matrix C = A + \xl and we solve 

Cy = b. Since A = C — fil we can derive A -1 = - X^fcli(y u C~ 1 ) fc an d so we can get the 

solution of the original system as: 



x 



A~ x b 

1 oo 



***=! 



1 oo 

^ k=l 

y + ^C~ l )y + ^C~ l ) 2 y + ... (4.7) 



Thus we have x = y + {\xC 1 )y + (jjC 1 ) 2 y + ..., that we can rewrite as Xk+i = Xk + 
(fiC-^y. 

The problem now is: how to choose \xl 

In his paper of 1955, Riley (see [16]) suggested to choose jj, = lO*" -7 '-', where p is the 
desired precision and a G {2,3}. 

In any case, to obtain a better condition number for the matrix A, fi should be greater 
than the minimum eigenvalue of A. On the other hand, for a fast convergence [i should 
be smaller or equal to the minimum eigenvalue of A. 

In our numerical experiments, (see Riley Residuals. m, trial. m) we try to find the best value 
for \x using a trial and error approach. Moreover, in the algorithm which implements the 
Riley method we have used the QR factorization instead of the Cholesky one since the 
matrix is not positive definite. 

For a number of centers equal to 6, 11, 17, 21 the best error value is obtained when \i has 
the same order of the minimum eigenvalue of A, that means that we have no improvement 
in the condition number. 

Finally we tried to use iterative methods applied to the system matrix. Differently from 
what we did for the Riley method we applied these methods to the scaled problem. As in 
the case of Riley method we had no improvements in the condition number of the matrix 
A. This is due mostly to the specific example we are considering. Indeed, in the case we 
deal with 21 centers, the condition number of the matrix is 2 • 10 8 (not so high) but the 
vector of coefficients has a very big norm: 8 • 10 6 . If we compute the relative residual, i.e. 
lf-jl , in the best case (i.e. using the iterative methods) is 3 • 10 -11 and considering the 
condition number we can obtain nothing better than what we have with the scaling. 
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Application to biological ODEs systems 



We want to consider the classical basis function interpolation problem for vector valued 

functions. This naturally leads to the use of matrix- valued kernels. 

The application is the solution of systems of ordinary differential equations. 

5.1 Reproducing Kernel Hilbert Spaces of vector- 
valued functions 

We start by stating some elementary results useful to extend the radial basis function 
theory to the vector- valued case (see [18, 19] for further explanation). 
Let y be a real Hilbert space with inner product (•,•), X a set and TC a linear space of 
functions on X with values in y. We assume that TC is also a Hilbert space with inner 
product (•,•). 

Definition 5.1 We say that TC is a reproducing kernel Hilbert space (RKHS) when for 
any y G y and x G X the linear functional which maps f G TC to (y, f(x)) is continuous. 

In this case according to Riesz lemma, there is for every x G X and y G y a function 
K(x\y) G TC such that, for all / G TC 

(y,f(x)) = (K(x\y),f). (5.1) 

Since K(x\y) is linear in y, we write K(x\y) = K x y where K x : y — ► TC is a linear operator. 
We can rewrite (5.1) as 

(y,f(x)) = (K x y,f). (5.2) 

For every x,t G X we also introduce the linear operator K(x,t) : y — ► y defined, for 
every y G y, by 

K(x,t)y = K t (y)(x) (5.3) 

We say that TC is normal provided there does not exist (x,y) G X x (y \ {0}) such that 
the linear functional (y, f(x)) = for all / G TC. 

In the following proposition we state the main properties of the function K . To this 
end, we let C(y) be the set of all bounded linear operators from y into itself and, for 
every A G C(y), we denote by A* its adjoint. We also use C + (y) to denote the cone of 
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nonnegative bounded linear operators, i.e. A G C + (y) provided that, for every y G y, 
(y, Ay) > 0. When this inequality is strict for all y ^ we say A is positive definite. 
Finally, we denote by N m the set of positive integers up to and including m. 

Proposition 5.1 If K(x,t) is defined, for every x,t G X, by equation (5.3) and K x is 
given by equation (5.2) the kernel K satisfies for every x,t G X, the following properties: 

1. For every y,z G y, we have that 

(y,K(x,t)z) = (K t z,K x y). (5.4) 

2. K(x,t) G C(y), K(x,t) = K(t,x)*, and K(x,x) G C+Qt). Moreover, K(x,x) is 
positive definite for all x G X if and only if TC is normal. 

3. For any m G N, {xj : j G N m } C X, {yj : j G N m } C y we have that 

Y, (VjiKix^xM^O. (5.5) 

4. \\K x \\ = \\K(x,x)\\^. 

5. \\K(x,t) < \\K(x,x)\\*\\K(t,t)\\'. 

6. For every f ETC and x G X we have that 

||/(x)||<||/||||^(x,x)||i (5.6) 

We say that K : X x X ^ £(y) is a kernel if it satisfies properties 1-3. 

So far we have seen that if 7i is a RKHS of vector-valued functions, there exist a kernel. 

In what follows, we show that a kernel determines a RKHS of vector- valued functions. 

Theorem 5.2 If K : X x X — ► C(y) is a kernel then there exists a unique (up to an 
isometry) RKHS which admits K as the reproducing kernel. 

We have to observe that in the case y = W 1 the kernel K is a n x n matrix of scalar- valued 
functions. The elements of this matrix can be identified by using equation (5.4). Indeed 
by choosing y = e& and z = ei, k,l G N m we obtain the following formula 

(K(x,t)) kl = (K x e k ,K tei ). (5.7) 

In particular when n = 1 equation (5.2) becomes f(x) = (K x ,f), which is the standard 
reproducing kernel property. 
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5.1.1 Interpolation problem for vector- valued functions 

Let fi,...f n : R s — ► M be mappings which are observed at sampling points Xi = 
{xi r : 1 < r < Ni} having data sites Dj = {d{ r : 1 < r < Ni} such that fi(xi r ) = d{ r , with 
1 < r < Ni and 1 < i < n. 

What is more we consider the possibility for the functions to be influenced by each other. 
Thus the generalization of the interpolation problem, also incorporating polynomials is: 



n ty Q 

fi( X ) = X] ^2 U Jr ( l ) i,j(x, X jr ) + ^ C UPl( X ) 1 < * < 
j=l r=l 1=1 



II 



(5i 



subject to the side conditions 



^ u jr q(x jr ) = \/q E TT s k _ v l<j<n 



(5.9) 



r=l 



where T^ s k _ 1 is the space of polynomials in 1R S of total degree not exceeding k — 1. Within 
this model the kernel 0^- expresses the influence of the jth component fj on the ith. 
If we interpret the functions fi, 1 < i < n, not separately, but as components of one 
single vector-valued function / : 1R S — ► M. n we can state the interpolation problem for 
vector valued data. 

Problem 5.3 Let n E N and Ni G N for 1 < i < n. Given n sets Xi = 
{xi r : 1 < r < Ni} C M s of data points and n vectors of values observed at these points 
di G M. Ni , solving the interpolation problem 

fi(x ir ) = d ir 1 < r < N h 1 < i < n 

we obtain the linear system: 



/^l,l 


1pl,2 ■ ■ 


■ 4>l,n 


Pi 





4>2,1 


4>2,2 ■ ■ 


■ 4>2,n 





P2 


4>n,l 


4>n,2 ■ ■ 


Wn,n 








PI 


o ■■ 


■ 











P T ■ ■ 

r 2 


■ o 








\ o 


o ■■ 


■ P T 











• °\ 




(uA 




( dl \ 




■ o 




u 2 




d 2 




■ Pn 




U n 




Cl n 




■ o 




Cl 









■ o 




c 2 









■ o) 




\Cn) 




{o) 



(5.10) 



where 
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ipij = [<l>iAx itr ,x jia )]?$ G R N ^ 1 < i,j < n 

Pi = \pi{*ir)\rji£. £ ^ XQ 1<1<U 

u^M^e^' \<i<n (5.11) 

(k = [cu]ti e R Q 1 < i < n 

di = [d ir ]r=i eR w ' 1 < i < n 

We can show that under some conditions on the matrix of kernels the interpolation prob- 
lem has a unique solution. Therefore, we have to transfer the notion of conditionally 
positive definiteness into the setting of matrices of kernels. 
We define the following subspace: 

V X ,k = {\p(Xr)\r=l ■ P £ K-l} C R N (5.12) 

where X = {x r : 1 < r < A^} C R s is a unisolvent set of points. 

Thus, the dimension of Vx,k is equal to the dimension of 7r^_ 1 . 

We can now give the definition of matrix conditionally positive definite kernels. 

Definition 5.2 An n x n matrix of kernels \l/ = [0i,j]",- = i with <f>ij : M s x M s — > R is said 
to be matrix conditionally positive definite of order k on R s if for any n sets of distinct 
points Xi = {x ir : 1 < r < Ni} C R s the matrix v|/ = [4>ij]^j =1 , where 

^i,j = [^(v.%)i5=i e RNtXN > i < ho < n 

is conditionally positive semidefinite with respect to the subspace Vx x ,k x • • • x Vx n ,k- The 
matrix of kernels is said to be strictly matrix conditionally positive definite of order k on 
W if for any n sets of distinct points X\,...,X n the matrix v|/ is conditionally positive 
definite with respect to Vx l ,k x • • • x Vx n ,k- 

where \I/ is the vector-valued kernel, fcj is the scalar-valued kernel, vj/ is the evaluation of 
$ ina proper matrix of distances. 

The following proposition shows that strictly matrix conditionally positive definiteness 
is sufficient to guarantee existence and uniqueness of the solution of the corresponding 
interpolation problems. 

Proposition 5.1 Let n G N and suppose that the n x n matrix of kernels ty = [0jj]f,- =1 
is strictly conditionally positive definite of order k on R s . Suppose that the n sets X t = 
{x^ : 1 < r < Ni} of distinct points o/R s are all unisolvent for n s k _ 1 . Then the matrix of 
(5.10) is invertible and the interpolation problem has a unique solution for any given set 
of values di G R^*, 1 < i < n. 



5.1. Reproducing Kernel Hilbert Spaces of vector-valued functions 
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Proof 

Consider the homogeneous system corresponding to Problem (5.3). Let u T = [uj, ...,u^], 
c T = [cf,...,c^] and [u T , c T ] be a solution of the homogeneous system. Then, from the 
lower part of the block system (5.10) we have that 



Pim 



1 < i < n 



(5.13) 



which shows that Ui belongs to Vj^ fc , for 1 < i < n. From the first rows of the block 
system (5.10) we obtain that 



A/>i,i ••• Vv\ fuA McA /o\ 

\4>n,l ■■■ 4>n,nJ \U n J \PnC n ) \0/ 

Multiplying on the left by u T leads to 



(5.14) 



Yl Yl <^ u 3 + H( p J u i) Tc i = °- 



(5.15) 



Thus, condition (5.13) implies that u T vJ/u = 0. However, by hypothesis the matrix 

VJ/ = (A,j)l j= l = ([^A^r^j,s)Z =1 )h=l (5-16) 

is positive definite on the subspace V^ k x • • • x V^ k- 

Hence the quadratic form can only be zero if u = 0. Thus the block system (5.14) reduces 

to 

P jCj = 0, j = l,...,n. (5.17) 

Now since the sets X; are unisolvent, we conclude that the vectors Cj, 1 < i < n are zero. 
Therefore, the only solution of the homogeneous system is the trivial solution, and the 
matrix of Problem (5.3) is invertible. 

□ 

We have shown that existence and uniqueness of the solution to the interpolation Problem 
5.3 is guaranteed for strictly matrix conditionally positive definite kernels. 
We now derive sufficient conditions on the matrix component kernels 
which provide strictly matrix conditionally positive definite kernels. 
For this purpose we introduce the notation (f)(x,y) = 4>(y,x). 



"1,3 ■ 



Theorem 5.4 Let n G N and 



"i,i ■ 



1 < hJ ' < n , be kernels. Suppose that: 



1. the matrix S = [o"ij]"o = i is symmetric with entries in {±1}, 

2. the kernels o~i,j{(j)ij + 4>i,j) are symmetric and conditionally positive definite of order 
k for all 1 < i ^ j < n, and 



72 Chapter 5. Application to biological ODEs systems 

3. the kernels (2(f)i,i — J2j^i IJ i,j(, ( l ) i,j + ( l ) i,j)) are strictly conditionally positive definite of 
order k for all 1 < i < n. 

Then the n x n matrix of kernels $ = [0i,j]",- = i is strictly conditionally positive definite 
of order k on R s . 

The theorem proof makes use of two lemma. 

Lemma 5.1 For any matrix G G R nxn conditionally (positive or negative) semidefinite 
with respect to subspace V C R n we have that 

\u T Gv + v T Gu\<\u T Gu + v T Gv\, for all u,v e V ± . (5.18) 

With this lemma we can show positive definiteness of a block matrix with the same 
structure as the upper left block of the matrix of system (5.10). 

Lemma 5.2 Let A^ G M 7Vx7V for 1 < i,j < n and let V be a subspace ofR N . Suppose 
that 

1. the matrix S = [ci,j]™,- = i is symmetric with entries in {±1}, 

2. the matrices Oi^A^ + AfA are symmetric and conditionally positive semidefinite 
with respect to V for j ^ i, and 

3. the matrices (2^4^ — ^2,j^ zi o'i ) j{Aij + AfA) are conditionally positive definite with 
respect to V for 1 < i < n. 

Then the matrix 

A ■= [A i3 ]l J=1 G R nJVxnJV 

is conditionally positive definite with respect to the subspace V n C M. nN . 

We observe that, for the case N = 1 in Lemma 5.2 with V = {0} 2 is automatically fulfilled 
and 3 is a statement about the dominance of diagonal entry An. Thus, the lemma can be 
interpreted as a Gershgorin type statement for block matrices. 

In what follows we are interested in matrix kernels defined by the product of a single 
scalar-valued kernel with a matrix of mixture coefficients. We state a theorem, that in 
this particular case gives a sufficient condition for matrix conditional positive definiteness. 

Theorem 5.5 Let C G M. nxn be a positive semidefinite matrix and ^ : I s x R s -» R a 
conditionally positive definite kernel of order k. Furthermore, let at least one of C or $ 
be symmetric. Then 

* = Cvj/ = [c i:j ip]l j=1 :R s xR s ^ R nxn 

is a matrix conditionally positive definite kernel of order k. If C is positive definite and 
tf> : IR S x 1R S — ► R is strictly conditionally positive definite of order k, then \& is strictly 
matrix conditionally positive definite of order k. 
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We remark that we do not expect the conclusion of Theorem 5.5 to hold if neither vj/ nor 
C is symmetric. However, sometimes the use of a non-symmetric matrix C gives better 
results than in the symmetric case (see [19]). 

5.2 Solution of ODEs systems 

In the following we show the results obtained by applying the indirect approach (IRBFN) 
to a system of ODEs. The solution technique is based on the vector- valued kernel theory, 
developed in the previous section. 
In all our numerical experiments we consider a matrix C such that: 

C = (J °) . (5.19) 

According to Theorem (5.5) the matrix C should be positive definite to guarantee matrix 
kernel strictly conditionally positive definiteness. If the matrix is symmetric, by applying 
Schur complement condition for positive definiteness we obtain that af3 < 1. If we require 
only conditionally positive definiteness, for our matrix, then we can consider a non strict 
inequality a(3 < 1. 

However, it is also true that if a matrix is symmetric and diagonally dominant then is 
positive semidefinite. Moreover if a matrix is diagonally dominant there is no need for 
(partial) pivoting when performing Gaussian elimination and we are sure that the Jacobi 
and Gauss Seidel methods converge. Thus, we ask \a + /3\ < 2. 

The first example is a system of two coupled differential equations of the first order: 

C ;= 3x + y ^ (5.20) 

With initial conditions x(0) = 1 and y(0) =0. 

Clearly, it is possible to analytically determine the solution of (5.20) that is: 



x= (l-t)e 4t 

y= (-t)e 



(5.21) 



In order to solve the system, we have to substitute the expression, which defines the 
network, in both the equations of (5.20), as we did for the differential equation (4.2). 
Then, we have to write the system in a matrix form like (5.10). Finally we multiply this 
system for the matrix C. 

The solution of the system (5.20) and the study of the parameter for the matrix C 
is implemented in the Matlab script OdeSystem.m. Since it is possible to analytically 
compute the solution we can calculate the /2-error comparing the approximation with the 
exact solution. 
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We choose randomly a, /? G [—0.9,0.9]. For symmetric case, we mean that a = /3. 

In Table 5.1 we display the minimum of the error for different choices of the parameters, 

while in Figure 5.1 we display the error plots. 

Table 5.1: Error solution ODEs system 





x-component 


y- component 


Non- symmetric 


9.6e + 00 


9.5e + 01 


Symmetric 


1.2e + 01 


9.5e + 01 



In Figure 5.2 we show the approximation of the function when a = j3 = 1. In this case 
the / 2 -error is displayed in Table 5.2: 

Table 5.2: Error solution ODEs system 



x-component 


y-component 


9.3e - 03 


2.6e-02 



Figure 5.2: Approximation of the solution with a = f3 = 1 
(a) Approximation of the x-component (b) Approximation of the y-component 



Approximation of a system of ODEs solution 



-exact 
-approximation 




Approximation of a system of ODEs solution 







exact 
— approximation 

-0.8 -0.6 -0.4 -0.2 0.2 0.4 0.6 

target points 



If a = (3 = 1 the condition on matrix C to be diagonally dominant is not satisfied. However 
the matrix C is semidefinite positive and thus from theorem 5.5 we are guaranteed to have 
a positive definite kernel matrix. 
Moreover, this choice of the parameter completely reflects how the equations are coupled. 
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Figure 5.1: 3D plots of the error for different choices of a and f3 

(a) Error of the x-component non-symmetric case (b) Error of the y-component non-symmetric case 
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(c) Error of the x-component symmetric case 



(d) Error of the y-component symmetric case 
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5.2.1 Biological Model 

In this section we apply our numerical method for the solution of ODEs, to a biological 
model for diabetes and insulin therapy. 

Diabetes mellitus is a disease of the glucose regulatory system characterized by fasting 
and/or postprandial hyperglycemia. Different factors are involved in the developing of 
this disease and the model we are taking into account is the union of three different partial 
differential models; the first one describes the insulin storage and secretion (see [20]); the 
second one the glucose and insuline dynamics and finally the third one models /5-cell cycle 
(see [21, 22]). 

In the following we solve the one concerning /3-cell cycle. This is a multi- compartment 
model, which is a mathematical model used for describing the way materials or energies are 
transmitted among the compartments of a system. In our specific case the compartments 
are represented by the cell cycle phases and the transmission is represented by the different 
transition rates. 

/3-cells are responsible for the production and the store of insulin. They are directly 

connected with the blood circuit and so they are able to measure the glucose concentration 

with great accuracy. 

The cell-cycle of /3-cells consists of different phases that lead finally to cell division, the 

mitosis. 

In the G\ phase the cell grows and some cell components are added. The S phase is the 

synthetis phase in which the DNA reduplicates. In the G 2 phase the cell prepares for 

mitosis and in the M phase the division of the chromosomes, the nucleus and the cell 

takes place. The two phases G 2 and M are considered together and they form the phase 

G 2 /M. 

Figure 5.3: Cell cycle phases 
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In the model the transition rates ci, c 2 , C3 and the apoptosis rate, which we indicate with 

jja, are also considered. 

Thus, the cell-cycle can be modelled as a three compartmental model: 



dGijt) 
dt 

dS(t) 
dt 



2c 3 -G 2 /M(t)-(c 1 + j j A )-G 1 (t) 
ci • G 1 (t) - c 2 ■ S(t) 



(5.22) 



*±g® =c 2 .S(t)-c 3 .G 1 (t) 



where G\(t), S(t) and G 2 /M(t) are the number densities of cells in G\, S and G 2 /M 
phase respectively. 

Insulin plays an important role in regulating /3-cell mass. To model the effect of insulin 
on the cell cycle we introduce a new transition rate 



c*(t) = c 1 + rl(t) -ci 



(5.23) 



where c\ is the original transition rate, r is a constant which describes the magnitude of 
the insulin effect on the cell cycle and I(t) is the insulin concentration in the blood at 
time t. 
Thus, the model, including the insulin influence on the cell cycle is the following 



dt 



^^ = 2c 3 • G 2 /M{t) - (( Cl + rI(t) Cl ) + n A ) ■ G 1 (t) 
= (c 1 +rI(t)c 1 )-G 1 (t)-c 2 -S(t) 



dS(t) 
dt 



dG 2 /M(t) 
dt 



c 2 ■ S(t) - c 3 ■ d(t) 



We rewrite (5.2.1) in a more synthetic way, by the following substitution: 



G 1 (t) = x 1 (t) 

S(t) = x 2 (t) 

G 2 /M(t) = x 3 (t) 

I(t) = x 5 (t) 

ci =pi 

c 2 =p 2 

c 3 =p3 

Va = P4 
r = p 5 



(5.24) 
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Thus, the system is: 



1 



2P3^3 - (Pl(l +P5X5) +P4)Xl 

Pi(l + p 5 x 5 )xi - p 2 x 2 (5.25) 

p 2 x 2 - P3X3 

We have to estimate X\,X2 and X3. The parameter Pi,P2,P3,P4,P5 are determined through 
model analysis. To obtain a unique solution we should choose a value for the parameter 



£5. We decided to take £5 



pips 



. With this choice the model is in its steady state. 



Figure 5.4: Approximation of the cell_cycle solution 
(a) Approximation of the 2^-component 
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(b) Approximation of the ^-component 



(c) Approximation of the ^-component 
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The implementation is in the Matlab script cell _ cycle _res.m which uses the Matlab 
function cell_ cycle. rn. In Figure 5.4, we compare the exact solution, the approximation 
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obtained by using Matlab solver ODE45 and the solution obtained through IRBFN. In 
Table 5.3 we display the weighted square error (4.6) for the three components by using 
the IRBFN method. 

Table 5.3: Error solution cell-cycle ODEs system 





Xi-component 


^-component 


a^-component 


IRBFN 


2.8e-02 


2.9e - 02 


2.5e - 02 


ODE45 


3.1e-02 


2.8e - 02 


7.1e-03 



5.3 Conclusions 



In this thesis we aimed to study a method based on a Radial Basis Function Network 
(RBFN) to solve a system of ODEs in order to apply it to a biological model for diabetes. 

After introducing the theory of Radial Basis Functions (RBF) and Artificial Neural Net- 
works (ANN), we focused on the application of the RBFN to the simultaneous approx- 
imation of a function and its derivatives. In particular, we provided a deep analysis on 
the functioning of the method by using different kernels. Our original contribution was 
aimed to understand why some RBF provide better accuracy than others. This fact has 
been explained by some observations on the localization properties of different kernels. 
We also noticed that a difference in accuracy may depend on the function we want to 
approximate. Indeed, it is fundamental that this function belongs to the native space of 
the kernel. 

The following step was to use the RBFN to solve an ordinary differential equation of 
arbitrary order. This led to face with the problem of the condition number of the matrix 
which computes the weights during the training phase of the network. 
We found that a scaling of the domain in which the ODE is defined, provides a great 
reduction in the condition number and that, even if this is a very simple technique it has 
been proven to work much better with respect to the Riley method. 

Finally, we reached our purpose of approximating a system of ODEs by a synergistic use 
of the vector valued kernel theory and of RBFN. Indeed, we interpreted the problem of 
the solution of a ODEs system as an interpolation problem of a vector valued function, 
whose components represent the differential equations which compose the system. 
In order to obtain a well posed interpolation problem, we have introduced the concept of 
a strictly conditionally positive kernel matrix. If we want to obtain such a kernel matrix 
from a strictly conditionally positive definite scalar valued kernel 0, it is sufficient to 
multiply the matrix, obtained as the evaluation of the kernel <f) m the matrix of distances, 
for a strictly positive definite matrix of coefficients, which we indicate with C. 
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We made a parametric analysis in order to determine the coefficients of the matrix C on 
the basis that a strictly diagonally dominant matrix is also positive definite. We then 
applied the RBFN to the solution of a system of two coupled differential equations and 
to the model for the /3-cell cycle. 

It is worth noting that, even if the error in the approximation by using RBFN is almost 
of the same order of the Matlab toolbox ODE45, our method is not a black box for the 
solution of ordinary differential equation. We expect that by improving the parameter 
analysis and also by using different types of kernels our results would improve. 

5.3.1 Future directions 

In all our numerical experiments we did not pay enough attention to the choice of the 

centers and of the data sites, and we have always considered equally spaced centers and 

target points. However, since a different location of the points may reduce the error in 

the boundary of the domain or may lead to an increase in the interpolation accuracy, this 

could be object of further studies. 

Another limitation of the work is in the numerical solution of the linear system which 

appears in the determination of the weights: it may be possible to make the process 

faster by using different basis as Newton basis. 

Moreover, a lot of work has to be done in order to improve the method for solving the 

system of ODEs. 

• by changing the method for the parameter analysis of the matrix C; 

• by using different kernels to determine the vector- valued kernel matrix and not only 
the Multiquadrics; 

• by applying the method to the whole system of differential equations. 

This investigation will be the main direction the authors will pursue in the near future. 



Appendix A 

DRBFN and IRBFN in two dimensions 



For the sake of completeness, we show that DRBFN and IRBFN can be also applied in 
the two dimensional case. 

Concerning the DRBFN, there are no differences with respect to the one dimensional case. 
Thus, we concentrate on the IRBFN. 

Let us take / : fl C R 2 — ► R, we approximate / and all its derivatives with respect to 
variable X\ and rr 2 . 

As in the one dimensional case, the key idea of the method is that we can simultaneously 
approximate a function and its derivatives up to order n, simply approximating the nth 
derivative with (3.9) and then computing n integrations. 
However, the way we compute the basis functions is slightly different. 

3 2 

For example, consider /(#i,£ 2 ) = x\%2 + ~f + ~f m the square [— 3,3] 2 . We denote with 
x = [#i, £ 2 ] T the data sites column vector and with c = [c\, c 2 ] T the centers column vector. 
We suppose there are m centers and n data sites. 

We approximate the second partial derivative of function / with respect to the variable 
ii as follows: 



^WiQfa-Ci) Vj = l,...,n (A.l) 



dxf 



Then, the first partial derivative with respect to variable x\ is 



dffa) f a 2 /(x. 



v j/ 



dx\ 



dx\ J dxf 

^Wi / $(xj - c i )dx 1 
i=i •* 

in 
^Wi^(Xj) + Ci(x 2 ) 



1=1 



(A.2) 



where H is the first integration of the kernel $(x) with respect to the variable x\. 



82 



Appendix A. DRBFN and IRBFN in two dimensions 



Finally, the function estimate is 



/(*i 



a/to 



dxi 



m „ 

2_, W i \ H(x.j)dxi 

m 

7J WjS(x.j) + Ci(x 2 )xi + C 2 (x 2 ) 



1=1 



(A.3) 



where i/ is the second integration of the kernel $(x) with respect to the variable x\. 

Thus, we have to determine the vector of the weights Wi, but we have also to approximate 
the functions C\{x 2 ) and C 2 (x 2 ). This can be done by using the one dimensional IRBFN, 
i.e. considering the following formula: 



M 



Ci (x 2 ) = ^ WiH (x 2 ) + K x x 2 + K 2 



(A.4) 



i=i 



where H is the second integration of the one-dimensional kernel $(x 2 ) and K\ and K 2 are 
the integration constants of the one- dimensional IRBFN (see 3.12). 

An analogous formula can be used for the function C 2 (x 2 ). 

As in the one-dimensional case, we are interested in using Gaussians, Inverse Multiquadrics 
and Multiquadrics as kernels. 

The implementations are in the Matlab functions DRBFN 2Dtest.m and IRBFN2Dtest.m 
(which are used by the Matlab GUI mygui.m) which test the network by using the Matlab 
functions DRBFN2D.m and IRBFN2D.m to compute the weights. 

In Table A.l we display the / 2 -error for different kernels and for the approximation of the 
functions and its first and second partial derivatives with respect to variable x\ by using 
DRBFN; in Table A.2 we show the / 2 -error for IRBFN. 



Table A.l: Error of DRBFN method 





/ 


df/dxi 


d 2 f/dx\ 


G 


Lie -02 


l.Oe - 02 


l.Oe -01 


IM 


6.0e - 03 


1.6e - 02 


2.4e - 01 


M 


4.0e - 03 


1.5e - 02 


1.9e-01 
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Table A.2: Error of IRBFN method 





/ 


df/dxi 


d 2 f/dx 2 


G 


2.1e-03 


l.Oe - 03 


2.3e - 03 


IM 


6.5e - 03 


1.2e - 03 


2.8e - 03 


M 


8.3e - 04 


4.9e - 04 


6.1e - 04 



In Table A. 3 we display the fo-error for different kernels and for the approximation of the 
functions and its first and second partial derivatives with respect to variable X2 by using 
DRBFN; in Table A. 4 we show the / 2 -error for IRBFN. 

Table A.3: Error of DRBFN method 





/ 


df/dx 2 


d 2 f/dx 2 


G 


Lie -02 


4.6e - 03 


4.8e - 02 


IM 


6.0e - 03 


2.4e - 02 


3.5e - 01 


M 


4.0e - 03 


Lie -02 


1.3e-01 



Table A.4: Error of IRBFN method 





/ 


df/dx 2 


d 2 f/dx 2 2 


G 


3.5e - 04 


6.4e - 04 


5.6e - 03 


IM 


2.3e - 03 


9.0e - 04 


Lie -02 


M 


1.4e - 03 


7.3e - 04 


1.3e - 03 



Appendix B 

A GUI interface with Matlab 



We have developed a GUI interface with Matlab which displays the plots for various 
functions approximation with DRBFN and IRBFN even in the two dimensional case. 



Figure B.l: A Matlab GUI 
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The implementations are in the Matlab functions MayDRBFNtestGUI.m, MaylRBFN- 
testGUI.m, MayDRBFN.m, MaylRBFN.m, DRBFN2Dtest.m, IRBFN 2Dtest.m } 
DRBFN2D.m, IRBFN2D.m and in the Matlab script mygui.m. 
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